Regular expressions, also known as regex or regexp, is a difficult subject to tackle. Don’t feel ashamed if you’re not 100% comfortable with writing your own regular expressions yet, as it does take some getting used to. My hope is that by the end of this article, you’ll be one step closer into rocking your own expressions in JavaScript without the need of relying so much on copypasta from Stack Overflow.
The first step to writing a regular expression is to understand how to invoke it. In JavaScript, regular expressions are a standard built-in object. Because of this, we can create a new RegExp
object in few ways:
/expression/.match('string to test against')
new
keyword with string argument, new RegExp('expression')
new
keyword with literal, new RegExp(/expression/)
I’ll use a combination of the methods just to show that they essentially perform the same job.
In my example I’m going to be working with a string that contains my first name, last name, and a domain name. In the real world, the example would need much more thought. There are scores of subtleties when it comes to dealing with names, which I won’t address here.
Let’s say I’m building a dashboard and want to display the name of the logged-in user. I have no control over the data that’s returned to me so I have to make do with what I have.
I need to convert aaron.arney:alligator.io
into Aaron Arney [Alligator]
.
Regular expressions fit a lot of logic into a single condensed object. This can and will cause confusion. A good practice is to break down your expression into a form of pseudo-code. This enables us to see what needs to happen and when.
First Last [Domain]
To match a string with a regular expression, all you have to do is pass the literal string. The i
at the end of the expression is a flag. The i
flag in particular stands for case insensitive
. That means that our expression with ignore casing on the string.
const unformattedName = 'aaron.arney:alligator.io';
const found = unformattedName.match(/aaron/i);
console.log(found);
// expected output: Array [ "aaron" ]
That works well, yet in our case it isn’t a good approach since the name of the user isn’t always going to be “Aaron.” This is where we explore programmatically matching strings.
Let’s focus on matching a first name for the time being. Break the word down into individual characters, what do you see?
The name “Aaron” consists of five alpha characters. Does every first name have only five characters? No, but it is reasonable to assume that first names can range between 1 and 15 characters. To denote a character in range of a-z, we use [a-z]
.
Now, if we update our expression to use this character class…
const unformattedName = 'aaron.arney:alligator.io';
const found = unformattedName.match(/[a-z]/i);
console.log(found);
// expected output: Array [ "a" ]
Instead of extracting “aaron” from the string, it only returns “a.” This is good, as regular expressions try their hardest to match as little as possible. To repeat the character match a number up to our limit of 15, we use curly brackets. This tells the expression that we watch to match the preceding token, our “a-z”, to match between 1 and 15 times.
const unformattedName = 'aaron.arney:alligator.io';
const unformattedNameTwo = 'montgomery.bickerdicke:alligator.io';
const unformattedNameThree = 'a.lila:alligator.io';
const exp = new RegExp(/[a-z]{1,15}/, 'i');
const found = unformattedName.match(exp);
const foundTwo = unformattedNameTwo.match(exp);
const foundThree = unformattedNameThree.match(exp);
console.log(found);
// expected output: Array [ "aaron" ]
console.log(foundTwo);
// expected output: Array [ "montgomery" ]
console.log(foundThree);
// expected output: Array [ "a" ]
Extracting the last name should be as easy as copying and pasting our first expression. You’ll notice that the match still returns the same value instead of both the first and last names.
Break down the string character by character, there is a full stop separating the names. To account for this, we add the full stop to our expression.
We have to be careful here. The .
can mean one of two things in an expression.
.
- Match any character except newline\.
- Match a .Using either version in this context will generate the same result, but that won’t always be the case. Tools like eslint will sometimes mark the escape sequence \
as unnecessary, but I say better safe than sorry!
const unformattedName = 'aaron.arney:alligator.io';
const exp = new RegExp(/[a-z]{1,15}\.[a-z]{1,15}/, 'i');
const found = unformattedName.match(exp);
console.log(found);
// expected output: Array [ "aaron.arney" ]
Since we prefer to split the string into two items as well as excluding the full stop from being returned by the expression, we can now use capturing groups
. These are denoted by parenthesis ()
and wrap around parts of your expression in which you want to be returned. If we wrap them around the first and last name expressions, we’ll get new results.
The syntax for using capture groups is simple: (expression)
. Since I only want to return my first and last name and not the full stop, wrap our expressions in parenthesis.
const unformattedName = 'aaron.arney:alligator.io';
const exp = new RegExp(/([a-z]{1,15})\.([a-z]{1,15})/, 'i');
const found = unformattedName.match(exp);
console.log(found);
// expected output: Array [ "aaron.arney", "aaron", "arney" ]
To extract “alligator.io”, we will use the character classes we’ve already used thus far. With some slight modification, of course.
Validating domain names and TLD’s is a difficult business. We’re going to pretend the domains that we parse, are always > 3 && < 25
characters. The TLD’s are always > 1 && < 10
. If we plug these in, we will get some new output:
const unformattedName = 'aaron.arney:alligator.io';
const exp = new RegExp(/([a-z]{1,15})\.([a-z]{1,15}):([a-z]{3,25}\.[a-z]{2,10})/, 'i');
const found = unformattedName.match(exp);
console.log(found);
// expected output: Array [ "aaron.arney:alligator.io", "aaron", "arney", "alligator.io" ]
I showed you the “long way” of going about the expression. Now, I’ll show you how you can have a less verbose expression that captures the same text. By using the +
quantifier, we can tell our expression to repeat the preceding token as many times as it can. It will continue until it hits a dead end, in our case the full stop. This expression also introduces the g
flag, which stands for global
. It tells the expression that we want to repeat our search as many times as possible, instead of the least times.
// With the global flag
'aaron.arney:alligator.io'.match(/[a-z]+/ig);
// expected output: Array(4) [ "aaron", "arney", "alligator", "io" ]
// Without the global flag
'aaron.arney:alligator.io'.match(/[a-z]+/i);
// expected output: Array(4) [ "aaron" ]
To format the string, we’ll be using the replace
method on the String
object. The replace
method takes two arguments:
RegExp | String
- A regular expression object or literalRegExp | function
- A regular expression or functionconst unformattedName = 'aaron.arney:alligator.io';
// The "long" way
const exp = new RegExp(/([a-z]{1,15})\.([a-z]{1,15}):([a-z]{3,25}\.[a-z]{2,10})/, 'i');
unformattedName.replace(exp, '$1 $2 [$3]');
// expected output: "aaron arney [alligator.io]"
// A slightly shorter way
unformattedName.replace(/([a-z]+)\.([a-z]+):([a-z]+\.[a-z]{2,10})/ig, '$1 $2 [$3]');
// expected output: "aaron arney [alligator.io]"
In the above snippet, the $1
, $2
, $3
are special patterns that get interpreted by the replace
method.
$1
- The first result from the match array
=>A reference to the first parenthesized group
$2
- The second result from the match array
=> A reference to the second parenthesized group
$n
- So on and so onTo capitalize the words, we can use another regex. Instead of formatting the output like we did above, we will pass a function. The function capitalizes the argument supplied and returns it.
Here, I’m introducing a couple new parts, anchors
, alternation
, and a new character class [^]
.
[^abc]
- Not a
, b
, or c
\b
- Word boundaryab|cd
- Logical “OR”, matches ab
or cd
// Capitalize the words
"aaron arney [alligator.io]".replace(/(^\b[a-z])|([^\.]\b[a-z])/g, (char) => char.toUpperCase());
// expected output: "Aaron Arney [Alligator.io]"
Breaking down this expression into two parts…
(^\b[a-z])
- Capture the first character of the string. ^
says to match the beginning of the string.|([^\.]\b[a-z])
- OR, match a new word that does not start with a full stop .
, as this is the TLD.This is but a small taste of the power of regular expressions. The example I worked through is improvable, but how?
This is where you take the knowledge you learned and try to answer those questions. Explore the following resources to help you in your journey and experiment!
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!