Style Flashcards

1
Q

The Match is Just Another Capture Group

A

Basically, you can imagine that there is a set of parentheses around your entire regex. These parentheses are just implied. They capture Group 0, which by convention we call “the match”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Should I Split, or should I Match All?

A

Matching All and Splitting are two sides of the same coin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

To write good regex, say what you mean. Say it clearly.

A

The more specific your expressions, the faster your regex will match; and, often more importantly, the faster your regex will fail when no match is there to be found.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Whenever Possible, Anchor.

A

Anchors, such as the caret ^ for the beginning of a line and the dollar sign $ for the end of a line often provide the needed clue that ensures the engine finds a match in the right place.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When You Know what You Want, Say It. When You Know what You Don’t Want, Say It Too!

A

Be as specific as possible, whether by using a literal B character, a \d digit class or a \b boundary. Another great way to be specific is to say what you don’t want—whether what you don’t want is… a double quote: [^”]… a digit: \D… or for the next three letters to be “boo”: (?!boo)[a-z]{3}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Contrast is Beautiful—Use It.

A

Use consecutive tokens that are mutually exclusive in order to create contrast.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Want to Be Lazy? Think Twice.

A

Lazy quantifier causes backtracking at each step (see Lazy Quantifiers Are Expensive). This is more efficient: {[^}]*}. This is a variation on Use Contrast and When you know what you want, say it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

A Time for Greed, a Time for Laziness.

A

Likewise, a greedy quantifier may shoot down the string then backtrack all the way back when all you needed was a few nudges with a lazy quantifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

On the Edges: Really Need Boundaries or Delimiters? Use Them—or Make Your Own!

A

For instance, using lookarounds, you can make a boundary to check for changes from upper- to lower-case, which can be useful to split a CamelCase string: (?<=[a-z])(?=[A-Z]) However, do not overuse boundaries, because good contrast often make them redundant (see Use Contrast.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Don’t Give Up what You Can Possess.

A

Atomic groups (?> … ) and the closely-related possessive quantifiers can save you a lot of backtracking.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Don’t Match what Splits Easily, and Don’t Split what Matches Nicely.

A

You’ll often find that one way is easy and the other nearly impossible. Therefore, if someone tells you “I want to match all the…” or “I am trying to split by…”, try not to rush down the first alley because they said “split” or “match”: remember the other side of the coin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Design to Fail.

A

Take (?=.*fleas).*. It does a reasonable job of matching lines that contain fleas. In comparison, consider ^(?=.*fleas).*. The only difference is the caret anchor. It doesn’t look like a big deal, but once the engine fails to find fleas at the start of the string, it stops because the lookahead is anchored at the start. This pattern is designed for failure, and it is much more efficient—O(N) vs. O(N2) for the first.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Greedy atoms anchor again.

A

✽ “Greedy” reminds you to check if some greedy quantifiers should be made lazy, and vice-versa. It also reminds you of the performance hit of lazy quantifiers (backtracking at each step), and of potential workarounds. ✽ “Atoms” reminds you to check if some parts of the expression should be made atomic (or use a possessive quantifier). ✽ “Anchor” reminds you to check if the expression should be anchored. By extension, it may remind you of boundaries, and whether to add them—or remove them. ✽ “Again” reminds you to check if parts of the expression could use the repeating subpattern syntax.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

✽ A for Anchor
✽ G for Greed
✽ R for Repeat
✽ A for Atomic

A

AGRA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly