RegEx Cheatsheet
Syntax
|
: or()
: group
Characters
.
: any character (dot matches everything except newlines)\w
: alphanumeric character plus_
, equivalent to[A-Za-z0-9_]
\W
: non-alphanumeric character excluding_
, equivalent to[^A-Za-z0-9_]
\s
: whitespace\S
: anything BUT whitespace\d
: digit, equivalent to[0-9]
\D
: non-digit, equivalent to[^0-9]
[...]
: one of the characters[^...]
: anything but the characters listed
Anchors
^
: beginning of a line or string$
: end of a line or string\b
: zero-width word-boundary (like the caret and the dollar sign)\A
: Matches the beginning of a string (but not an internal line).\z
: Matches the end of a string (but not an internal line).
Repetition Operators
?
: match 0 or 1 times+
: match at least once*
: match 0 or multiple times{M,N}
: minimum M matches and maximum N matches{M,}
: match at least M times{0,N}
: match at most N times
Greedy vs Lazy
.*
: match as long as possible.*?
: match as short as possible
BRE vs ERE vs PCRE
The only difference between basic and extended regular expressions is in the behavior of a few characters: ?
, +
, parentheses (()
), and braces ({}
).
- basic regular expressions (BRE): should be escaped to behave as special characters
- extended regular expressions (ERE) : should be escaped to match a literal character.
- Perl Compatible Regular Expressions (PCRE): much more powerful and flexible than BRE and ERE.
Multiple flavors may be supported by the tools:
- sed
sed
: basicsed -E
: extended
- grep
grep
: basicegrep
orgrep -E
JavaScript
str.search
str.match
str.matchAll
str.replace
Example: split country name and country code in strings like "China (CN)"
> s = "China (CN)";
'China (CN)'
> match = s.match(/\((.*?)\)/)
[ '(CN)', 'CN', index: 6, input: 'China (CN)', groups: undefined ]
> match[1]
'CN'
> s.substring(0, match.index).trim()
'China'
Match all:
const regex = /.*/g;
const matches = content.matchAll(regex);
for (let match of matches) {
// match[0] is the matched string
// match[1] is the first capture, etc
}
Literal vs. Constructor
- Literal:
re = /.../g
- Constructor:
re = new RegExp("...")
- can use string concat:
re = new RegExp("..." + some_variable + "...")
- can use string concat:
Local vs. Global
re = /.../
:re.match(str)
will return a list of captures of the FIRST match.re = /.../g
:re.match(str)
will return a list of matches but NOT captures.
match vs. exec
str.match()
: as stated above.regex.exec()
: return captures, more detailed info; exec multiple times.
Example
var match;
while ((match = re.exec(str)) !== null) {}
Python
match
, search
and findall
:
re.match()
: only match at the beginning of the string, returns amatch
object.re.search()
: locate a match anywhere in string, returns amatch
object.re.findall()
: find all occurrences, returns a list of strings.
>>> type(re.search("foo", "foobarfoo"))
<class '_sre.SRE_Match'>
>>> type(re.match("foo", "foobarfoo"))
<class '_sre.SRE_Match'>
re.match()/re.search()
re.match()
and re.search()
return a match
object:
>>> match = re.search("f(.*?),", "foo,faa,fuu,bar")
>>> match.groups()
('oo',)
match.group(0)
returns the string snippet that matches the pattern:
>>> match.group(0)
'foo,'
other group
captures the ones in ()
:
>>> match.group(1)
'oo'
re.findall()
re.findall()
returns a list, extract value using []
:
>>> match = re.findall("f(.*?),", "foo,faa,fuu,bar")
>>> match
['oo', 'aa', 'uu']
>>> match[0]
'oo'
Compiled Patterns
pattern = re.compile(pattern_string)
result = pattern.match(string)
is equivalent to
result = re.match(pattern_string, string)
re.compile()
returns a SRE_Pattern
object:
>>> type(re.compile("pattern"))
<class '_sre.SRE_Pattern'>