Regular Expressions
Regular expressions are patterns created from characters and metacharacters defined to search matching in strings.
JavaScript uses regular expressions which syntax based on Perl regexes standard.
Creating regular expressions
Regular expressions in JavaScript are represented by the RegExp objects. There are two ways of defining regular expressions and creating RegExp objects:
1. Using the RegExp constructor function and passing to it string containing a pattern:
var pattern = new RegExp('x.z?');
Regular expression created using RegExp constructor are compiled at runtime. You can use this way when you do know the pattern in advance.
2. Using a literal syntax, which consists of a pattern enclosed between slashes:
var pattern = /x.z?/;
Regular expression created using literal notation are compiled when the script is loaded. You can use this way when you know the pattern in advance.
Set flags
Regular expressions may use flags which modify their default search behaviour.
For RegExp constructor you set flags passing them in the second argument:
var pattern = new RegExp('x.z?', ‘gi’);
For literal notation you add them after closing slash “/”:
var pattern = /x.z?/gi;
In both cases these flags set global case-insensitive search.
Regular expressions patterns
Regular expressions consists of:
- Ordinary characters that represent themselves. For example the pattern /hello world/ matches the part of the text identical with this pattern. The ordinary characters are all characters except special characters.
- Special characters,, that are not treated literally but are interpreted in a special way. For example ‘.’ (dot) matches any character except a newline.
Simple patterns
Simple patterns consist only of ordinary characters and are used to find direct match. For example the pattern /xyz/ matches only strings that contain exactly phrase ‘xyz’, with the same characters in the same order. It matches the strings: ‘registration of .xyz domain’, ‘hash abcxyz123’ because both strings contains substring ‘xyz’. It does not match strings ‘hash abcxy’, ‘x y z’, because they do not contain the exact substring ‘xyz’.
Complex patterns
Complex patterns use both ordinary and special characters. Special characters are useful when pattern needs something more than direct matching. For example when you need to match one or more ‘x’ characters, or to match a substring in which any character is in a given position. The pattern /.xyz/ matches substrings that contain phrase ‘xyz’ preceded by any character. In the string ‘abcxyz123’ the pattern matches the substring ‘cxyz’.
Regular expressions syntax
Special characters
\ | Escape character, changes the meaning of the following character. If the backslash precedes a special character, it removes its special meaning, and it indicates that the character should be interpreted literally. For example the pattern /.xyz/ matches concatenation of any character and substring ‘xyz’, however /\.xyz/ matches literally substring ‘.xyz’. If the backslash precedes non-special character, it adds its a special meaning and indicates that the character should be interpreted in a special way. For example /w/ normally matches character ‘w’ in strings. /\w/ matches words, more precisely matches alphanumeric characters and underscore ‘_’. |
. | Matches any single character except newline character. |
* | Matches the preceding expression 0 or more times. |
+ | Matches the preceding expression 1 or more times. |
? | Matches the preceding expression 0 or 1 time. If it appears after quantifier *, +, ? or {} it makes the quantifier non-greedy. It is also used in lookahead assertions. |
^ | Matches beginning of the string. When it appears as the first character in a character set, it negates this set. |
$ | Matches end of the string. |
| | The equivalent of a logical OR. |
[ | Beginning of character class definition. |
] | End of character class definition. |
( | Beginning of subexpression. |
) | End of subexpression. |
{ | Beginning of quantifier min/max. |
} | End of quantifier min/max. |
Character classes
Character class you can use to match a character from specific set of characters. Character class you define placing one or more characters in square brackets. If the quantifier is not given after it, exactly one of the characters defined in the character class will be matched.
[…] | Matches one character from the character set |
For example: | |
[abc] | Matches one of ‘a’, ‘b’ or ‘c’ |
[a-z] | Matches any small letter |
[a-zA-Z] | Matches any letter |
[0-9] | Matches any digit |
[^…] | Matches one character outside the character set |
For example: | |
[^0-9] | Matches any character which is not a digit |
Predefined Character classes
There is a number of predefined character classes, which are convenient shorthands for commonly used regular expressions.
\d | Matches any digit |
\D | Matches any non-digit character |
\w | Matches any word character (letter and digit) |
\W | Matches any single non-word character |
\s | Matches any whitespace character |
\S | Matches any non-whitespace character |
Regular expressions quantifiers
* | Matches the preceding expression 0 or more times, as many items as possible. For example /k*x/ matches in string ‘mx mkx mmkkxx’ substrings ‘x’, ‘kx’, ‘kkx’. /#.*#/ matches in string “ab #cd# ef #gh#” substrings “#cd# ef #gh#” |
+ | Matches the preceding expression 1 or more times, as many items as possible. For example /p+x/ matches in string ‘mx mpx mmppxx’ substrings ‘px’, ‘ppx’. /#.+#/ matches in string “ab #cd# ef #gh#” substrings “#cd# ef #gh#”. |
? | Matches the preceding expression 0 or 1 time. For example /c?x/ matches in the string ‘ccx yxx’ substrings ‘cx’, ‘x’. /abc?/ matches abc or ab. |
{m} | Matches the preceding expression exactly m times |
{m,} | Matches the preceding expression at least m times |
{m,n} | Matches the preceding expression between m and n times |
{,n} | Matches the preceding expression between 0 and n times |
[…] | Matches one of the characters from the set |
[^…] | Matches one outside the characters from the set |
X | Y | Matches A or B |
Non-greedy quantifiers
Non-greedy (lazy) quantifiers match as little characters as possible.
*? | Matches the preceding expression 0 or more times, lazy version. For example /#.*?#/ matches in string “ab #cd# ef #gh#” substrings “#cd#” and “#gh#”. |
+? | Matches the preceding expression 1 or more times, lazy version. For example /#.+?#/ matches in string “ab #cd# ef #gh#” substrings “#cd#” and “#gh#”. |
?? | Matches the preceding expression 0 or 1 time, lazy version. For example /abc??/ matches ab or abc. |
{m,n}? | Matches the preceding expression between m and n times, lazy version. |
Anchors
^ | Matches beginning of the string. For example /^alpha/ matches substring ‘alpha’ in string ‘alpha beta’ but does not match it in the string ‘the alpha’. |
$ | Matches end of the string. For example /alpha$/ matches substring ‘alpha’ in the string ‘the alpha’ but does not match it in the string ‘alpha beta’. |
Word boundaries
\b | Matches word boundary or the beginning of the string, or the end of the string. For example /\bs/ matches “s” in the string “square” because it is the beginning of the string. /e\b/ matches “e” because it is the end of the string. /\bq/ does not match because “q” is preceded by word character. |
\B | Matches non-word boundary. For example /e\B./ in the string “blue screen” matches the substring “ee”. |
Assertions
(?=regex) | Positive lookahead, matches item only if the pattern inside the lookahead can be matched after item. For example t(?=s) matches the second “t” in the string “streets”. |
(?<=regex) | Positive lookbehind, matches item only if the pattern inside the lookahead can be matched before item. For example (?<=s)t matches the first “t” in the string “streets”. |
(?!regex) | Negative lookahead. Matches item only if the pattern inside the lookahead cannot be matched after item. For example t(?!s) matches the first “t” in the string “streets”. |
(? | Negative lookbehind, matches item only if the pattern inside the lookahead cannot be matched before item. For example (?!s)t matches the second “t” in the string “streets”. |
Groups
(…) | Matches expression between parentheses. Matched text is captured into numbered groups that can be reused with a numbered backreference as \1, \2, \3 etc.
For example /(abc) (def) \1 \2/ matches in the string “abc def abc def” first two words “abc” and “def” and remembers them. The \1 and \2 mean the first and the second matched word. They are used in the matching part of the regular expressions and they match the last two words. |
(?:…) | Non-capturing parentheses group, you cannot use backreference for it. |
Regular expressions flags
g | Global match the pattern repeatedly in the string, |
i | Case insensitive matching, |
m | Multiple lines search, |
u | Unicode, treats characters as unicode points, |
y | Sticky search, starting at the current position. |
Using regular expressions in JavaScript
Regular expressions in JavaScript are used with two object types: RegExp and String.
RegExp methods
The exec() method executes a regular expression to match it in a given string. It takes string as the argument and searches it to match the pattern. If it find matching, it returns an array. The first element of the array contains the string that matched the pattern. Possible subsequent elements contain the substrings that matched subexpressions in parentheses.
Example:
var myRe = new RegExp('(a.c)d'); var myArray = myRe.exec('eaxcda'); console.log(myArray); // [ "axcd", "axc" ]
The exec() method always applies a single match and returns information about that match. this requires using the global g flag and calling the exec method repeatedly to find all matches.
When the exec() method is executed on a regular expression that has the g flag, it sets the lastIndex property of the regexp object to the position immediately following matched substring. When the exec() is executed next time on the same regexp, it starts search at the position specified by the lastIndex property. If it does not find the match, it set the lastIndex to 0.
var myRe = new RegExp('(a.c)d', 'g'); var myArray = myRe.exec('eaxcdaycdacdaazcdc'); console.log(myArray); // [ "axcd", "axc" ] console.log(myRe.lastIndex); // 5 myArray = myRe.exec('eaxcdaycdacdaazcdc'); console.log(myArray); // [ "aycd", "ayc" ] console.log(myRe.lastIndex); // 9 myArray = myRe.exec('eaxcdaycdacdaazcdc'); console.log(myArray); // [ "azcd", "azc" ] console.log(myRe.lastIndex); // 17 myArray = myRe.exec('eaxcdaycdacdaazcdc'); console.log(myArray); // null console.log(myRe.lastIndex); // 0
The test() method takes a string as its argument and returns true if the regular expression was matched in the string and false otherwise.
var myRe = /a.cd/; var result = myRe.test('eaxcda'); console.log(result); // true
String methods handling regular expressions
String methods that handling regular expressions are search(), match(), replace() and split(). They accept both string and regexp as search argument.
The search() method takes a regular expression as its argument and returns the starting position of the first matching substring. If there is no match it returns -1.
var text = 'Have passions for JavaScript and pass the exam'; var position = text.search(/pass/); console.log(position); // 5
In the same way it works with the string argument:
var position = text.search('pass'); console.log(position); // 5
The match() method takes a regular expression argument and returns an array containing the result of the match. The first element of the array is the first matching string. Subsequent elements are substrings matching subexpressions in parentheses.
var text = 'eaxcdaycdacdaazcdc'; var myArray = text.match(/(a.c)d/); console.log(myArray); // [ "axcd", "axc" ]
If the regular expression has the global g flag set, the match() returns an array with all matches.
var text = 'eaxcdaycdacdaazcdc'; var myArray = text.match(/(a.c)d/g); console.log(myArray); // [ "axcd", "aycd", "azcd" ]
The replace() method takes a regular expression as the first argument and a replacement string as the second argument. It searches the string on which it is called and replaces matched substring by the given replacement string.
var text = 'The car is mine and the bike is yours.'; var result = text.replace(/is/, 'was'); console.log(result); // The car was mine and the bike is yours.
By default the replace() replaces only the first match. If the regular expression has the global g flag set it replaces all matches in the string by the replacement string.
var text = 'The car is mine and the bike is yours.'; var result = text.replace(/is/g, 'was'); console.log(result); // The car was mine and the bike was yours.
The replace() allows you to refer to the matching subexpressions in parentheses in the replacement string. Substrings that match subexpressions are numbered and remembered. You can refer to them using the $ character followed by a number that indicates specific match. You can use this feature to swap substrings matching a pattern.
var text = 'This is "ordinary phrase". And that is #special sentence#.'; var result = text.replace(/"(.*)"(.*)#(.*)#/, '"$3"$2#$1#'); console.log(result); // This is "special sentence". And that is #ordinary phrase#.
The split() method breaks the string on which it is called into substrings using the passed argument as the separator. It returns an array of resultant substrings.
var carList = 'Opel, Ford, Toyota, Renault, Audi'; var carArray = carList.split(/,\s*/); console.log(carArray); // [ "Opel", "Ford", "Toyota", "Renault", "Audi" ]
Reply