Search by regular expressions
- Last UpdatedJul 26, 2024
- 3 minute read
A regular expression describes one or more strings to match when you search a script using alphanumeric characters and special characters known as metacharacters. The regular expression serves as character pattern to compare with the script text being searched.
Regular expressions are constructed much like arithmetic expressions are created. Small expressions are combined by using a variety of metacharacters and operators to create larger expressions.
The components of a regular expression can be individual characters, sets of characters, ranges of characters, or choices between characters. Components can also be any combination of these components.
Script Regular Expressions
|
Regular Expression |
Purpose |
Example |
|
. |
Match any single character (except a line break) |
s.e matches "ste" in "step" and "sfe" in "transfer" but not "acro" in "across". |
|
* |
Match zero or more occurrences of the preceding expression (match as many characters as possible) |
a*r matches "r" in "rack", "ar" in "ark", and "aar" in "aardvark" |
|
.* |
Match any character zero or more times (Wildcard *) |
c.*e matches "cke" in "racket", "comme" in "comment", and "code" in "code" |
|
+ |
Match one or more occurrences of the preceding expression (match as many characters as possible) |
e.+e matches "eede" in "feeder" but not "ee". |
|
.+ |
Match any character one or more times (Wildcard ?) |
e.+e matches "eede" in "feeder" but not "ee". |
|
*? |
Match zero or more occurrences of the preceding expression (match as few characters as possible) |
e.*?e matches "ee" in "feeder" but not "eede". |
|
+? |
Match one or more occurrences of the preceding expression (match as few characters as possible) |
e.+?e matches "ente" and "erprise" in "enterprise", but not the whole word "enterprise". |
|
^ |
Anchor the match string to the beginning of a line or string |
^car matches the word "car" only when it appears at the beginning of a line. |
|
\r?$ |
Anchor the match string to the end of a line |
End\r? nbsp;matches "end" only when it appears at the end of a line. |
|
[abc] |
Match any single character in a set |
b[abc] matches "ba", "bb", and "bc". |
|
[a-f] |
Match any character in a range of characters |
be[n-t] matches "bet" in "between", "ben" in "beneath", and "bes" in "beside", but not "below". |
|
() |
Capture and implicitly number the expression contained within parenthesis |
([a-z])X\1 matches "aXa"and "bXb", but not "aXb". ". "\1" refers to the first expression group "[a-z]". |
|
(?!abc) |
Invalidate a match |
real (?!ity) matches "real" in "realty" and "really" but not in "reality." It also finds the second "real" (but not the first "real") in "realityreal". |
|
[^abc] |
Match any character that is not in a given set of characters |
be[^n-t] matches "bef" in "before", "beh" in "behind", and "bel" in "below", but not "beneath". |
|
| |
Match either the expression before or the one after the symbol. |
(sponge|mud)bath matches "spongebath" and "mudbath." |
|
|\^ |
Escape the character following the backslash |
|
|
{x}, |
Specify the number of occurrences of the preceding character or group |
x(ab){2}x matches "xababx", and x(ab){2,3}xmatches "xababx" and "xabababx" but not "xababababx". |
|
\p{X} |
Match text in a Unicode character class, where "X" is the Unicode number. |
\p{Lu} matches "T" and "D" in "Thomas Doe". |
|
\b |
Match a word boundary |
\bin matches "in" in "inside" but not "pinto". |
|
\r?\n |
Match a line break (ie a carriage return followed by a new line). |
End\r?\nBegin matches "End" and "Begin" only when "End" is the last string in a line and "Begin" is the first string in the next line. |
|
\w |
Match any alphanumeric character |
a\wd matches "add" and "a1d" but not "a d". |
|
(?[^\r\n])\s |
Match any whitespace character. |
Public\sInterface matches the phrase "Public Interface". |
|
\d |
Match any numeric character |
\d matches and "3" in "3456", "2" in 23", and "1" in "1". |
|
\uXXXX where XXXX specifies the Unicode character value |
Match a Unicode character |
\u0065 matches the character "e". |
|
\b(\w+|[\w-[0-9\]]\w*)\b |
Match an identifier |
Matches "type1" but not &type1" or "#define". |
|
((\".+?\")|('.+?')) |
Match a string inside quotes |
Matches any string inside single or double quotes. |
|
\b0[xX]([0-9a-fA-F])\b |
Match a hexadecimal number |
Matches "0xc67f" but not "0xc67fc67f". |
|
\b[0-9]\.\[0-9]+\b |
Match integers and d |
Matches "1.333". |
Order of precedence
A regular expression is evaluated from left to right and follows an order of precedence.
The following table contains the order of precedence of regular expression operators, from highest to lowest.
|
Operator or operators |
Description |
|
\ |
Escape |
|
(), (?:), (?=), [] |
Parentheses and brackets |
|
*, +, ?, {n}, {n,}, {n,m} |
Quantifiers |
|
^, $, \anymetacharacter |
Anchors and sequences |
|
| |
Alternation |
Characters have higher precedence than the alternation operator, which, for example, allows "m|food" to match "m" or "food".