Regular Expressions Flashcards
RegEx matches for
/[wW]oodchuck/
Woodchuck
woodchuck
RegEx matches for
/[abc]/
‘a’, ‘b’, or ‘c’
RegEx matches for
/[1234567890]/
any digit
RegEx matches for
/[A-Z]/
An upper case letter
RegEx matches for
/[a-z]/
A lower case letter
RegEx matches for
/[0-9]/
A single digit
RegEx matches for
/[^A-Z]/
Not an upper case letter
RegEx matches for
/[^Ss]/
Neither ‘S’ nor ‘s’
RegEx matches for
/[^.]/
not a period
RegEx matches for
/[e^]/
either ‘e’ or ‘^’
RegEx matches for
/a^b/
the pattern ‘a^b’
RegEx matches for
/woodchucks?/
woodchuck or woodchucks
RegEx matches for
/colou?r/
color or colour
RegEx matches for
/beg.n/
any character between beg and n
RegEx matches for
start of a line
The three uses of the caret ^
- to matcht the start of a line
- to indicate a negation inside of square brackets
- just to mean a caret
RegEx matches for
$
end of line
RegEx matches for
\b
word boundary
RegEx matches for
\B
non-word boundary
Kleene *
“cleany star”
Zero or more occurrences of the immediately previous character or regular expression.
Kleene +
One or more occurrences of the immediately preceding character or regular expression.
Wildcard expression
.
Matches any single character (except a carriage return)
Anchors
Special characters that anchor regular expressions to particular places in a string.
The most common anchors are the caret ^ and the dollar sign $.
^ matches the start of a line.
$ matches the end of a line
\b matches a word boundary
\B matches a non-word boundary
Disjunction
the pipe symbol |
/cat|dog/ matches either cat or dog
Precendence
Use parentheses ( and )
Enclosing a pattern in paretheses makes it act like a single character for the purposes of neighbouring operators like the pipe | and the Kleene*
Operator precedence hierarchy
Parenthesis ()
Counters *
+
?
{}
Sequences and anchors the
^my
end$
Disjunction I
In that order.
RegEx matches for
\d
Any digit
RegEx matches for
\D
Any non-digit
RegEx matches for
\w
Any alphanumeric / underscore
RegEx matches for
\W
A non-alphanumeric
RegEx matches for
\s
Whitespace (space, tab)
RegEx matches for
\S
Non-whitespace
RegEx matches for
*
zero or more occurrences of the previous char or expression
RegEx matches for
+
one or more occurrences of the previous char or expression
RegEx matches for
?
exactly zero or one occurrence of the previous char or expression
RegEx matches for
{n}
n occurrences of the previous char or expression
RegEx matches for
{n,m}
From n to m occurrences of the previous char or expression.
RegEx matches for
{n,}
At least n occurrences of the previous char or expression.
RegEx matches for
{,m}
up to m occurrences of the previous char or expression
newline
\n
tab
\t
RegEx matches for
*
An asterisk “*”
RegEx matches for
.
A period “.”
RegEx matches for
\?
A question mark
RegEx matches for
\n
a newline
RegEx matches for
\t
a tab
Capture group
The use of parentheses to store a pattern in memory is called a capture group.
Every time a capture group is used (i.e., parentheses surround a pattern), the resulting match is stored in a numbered register.
If you match two different sets of parentheses, \2 means whatever matched the second capture group.
non-capturing group
We might want to use parenthesis for grouping, but don’t want to capture the resulting pattern in a register.
In that case, we use a non-capturing group, which is specified by putting the commands ?: after the open paren, in the form (?: pattern )
lookahead assertions
The operator (?= pattern)
is true if pattern
occurs, but is zero-width, i.e. the match pointer doesn’t advance.
The operator (?! pattern)
only returns true if a pattern does not match, but again is zero-width and doesn’t advance the curosor.