Regular Expression Syntax Flashcards
Python
\d
Most engines: one digit from 0 to 9
\d
Python 3: one Unicode digit in any script
\w
Most engines: “word character”: ASCII letter, digit or underscore
\s
Most engines: “whitespace character”: space, tab, newline, carriage return, vertical tab
\D
One character that is not adigit as defined by your engine’s \d
\W
One character that is not aword character as defined by your engine’s \w
\S
One character that is not awhitespace character as defined by your engine’s \s
{3}
Exactly three times
{2,4}
Two to four times
{3,}
Three or more times
*
Zero or more times
?
Once or none
.
Any character except line break
.
A period (special character: needs to be escaped by a )
\
Escapes a special character
\
Escapes a special character
|
Alternation / OR operand
( )
Capturing group
\1
Contents of Group 1
\2
Contents of Group 2
(?: )
Non-capturing group
\t
Tab
\r
Carriage return character
\n
Line feed character
\r\n
Line separator on Windows
+
The + (one or more) is “greedy”
?
Makes quantifiers “lazy”
*
The * (zero or more) is “greedy”
?
Makes quantifiers “lazy”
{2,4}
Two to four times, “greedy”
?
Makes quantifiers “lazy”
[ abc ]
One of the characters in the brackets
-
Range indicator
[x-y]
One of the characters in the range from x to y
[^x]
One character that is not x
[^x-y]
One of the characters not in the range from x to y
[\d\D]
One character that is a digit or a non-digit
[\x41]
Matches the character at hexadecimal position 41 in the ASCII table, i.e. A
Start of string or start of linedepending on multiline mode. (But when [^inside brackets], it means “not”)
$
End of string or end of linedepending on multiline mode. Many engine-dependent subtleties.
\A
Beginning of string (all major engines except JS)
(?i)
Case-insensitive mode (except JavaScript)
(?s)
DOTALL mode (except JS and Ruby). The dot (.) matches new line characters (\r\n). Also known as “single-line mode” because the dot treats the entire input as a single line
(?m)
Multiline mode (except Ruby and JS) ^ and $ match at the beginning and end of every line
(?x)
Free-Spacing Mode mode (except JavaScript). Also known as comment mode or whitespace mode
(?P…)
Python named group named bar
(?=…)
Positive lookahead (?=\d{10})\d{5}
(?<=…)
Positive lookbehind
(?<=\d)cat
(?!…)
Negative lookahead
(?!theatre)the\w+
(?<!…)
Negative lookbehind
\w{3}(?<!mon)ster
(?(A)X|Y)
If proposition A is true, then match pattern X; otherwise, match pattern Y.
(?(1)foo|bar)
If Group 1 has been set, the engine must match the literal characters foo. If not, it must match the literal characters bar
(?(foo)…|…)
Check whether the Group named foo has been set.
(?(-1)X|Y)
If the nearest capture group to the left of this conditional has been set, match pattern X; otherwise, match pattern Y.
Alternation behavior
An alternation is expected to quit as soon as one of the alternatives matches, not hold out for the longest match.
You can force it to continue by adding a condition after the alternation that can’t be met until the whole token has been consumed. The simplest option would be an anchor ($) or a word boundary (\b).