Chapter 11: Regex Flashcards by Maggie Pitt

Code that works when the input data is in a particular format but is prone to breakage if there is some deviation from the correct format. Aka easily broken.

brittle code

How well did you know this?

Not at all

Perfectly

regex + and * characters expand outward to match the LARGEST possible string

greedy matching

How well did you know this?

Not at all

Perfectly

A command available in most Unix systems that searches through text files looking for lines that match regular expressions.

grep
General Regular Expression Parser

How well did you know this?

Not at all

Perfectly

A language for expressing more complex search strings. May contain special characters that indicate that a search only matches at the beginning or end of a line or many other similar capabilities.

regular expression
(regex)

How well did you know this?

Not at all

Perfectly

A special character that matches any character. In regular expressions it’s the period.

wild card

How well did you know this?

Not at all

Perfectly

regular expression module

re
import re

How well did you know this?

Not at all

Perfectly

regex method that finds a specified regular expression in text, returns match object

re.search(regex, search string)

How well did you know this?

Not at all

Perfectly

regex that matches beginning of line

’^’
re.search(‘^From:’, line)

How well did you know this?

Not at all

Perfectly

regex that matches any character (a wildcard)

. (period/full stop)

re.search(‘F..m’, line) = From, Flam, F#om, etc.

How well did you know this?

Not at all

Perfectly

regex that applies to the immediately preceding character(s) and indicates to match zero or more times.

How well did you know this?

Not at all

Perfectly

regex that applies to the immediately preceding character(s) and indicates to match one or more times.

How well did you know this?

Not at all

Perfectly

regex method that returns a list of substring(s) that matches a regular expression

re.findall(substring, search string)
For loop: [‘substring1’][‘substring2’]

How well did you know this?

Not at all

Perfectly

regex that matches non-whitespace character

How well did you know this?

Not at all

Perfectly

regex format to accept specific characters

’[]’
Set notation

re.findall(‘[a-zA-Z0-9]’)

How well did you know this?

Not at all

Perfectly

regex format to match an actual period

[.] or \.

How well did you know this?

Not at all

Perfectly

When added to a regular expression, they are ignored for the purpose of matching, but allow you to extract a particular subset of the matched string rather than the whole string when using findall().

Study These Flashcards

()

re.findall(‘substring(part I want)’, string)

technique to insert regex as literal character

Study These Flashcards

backslash
\$ = ‘$’

regex that anchors to end of line

Study These Flashcards

regex that matches a whitespace character

Study These Flashcards

regex that applies to the immediately preceding character(s) and indicates to match zero or more times in “non-greedy mode”.

Study These Flashcards

regex that applies to the immediately preceding character(s) and indicates to match one or more times in “non-greedy mode”.

Study These Flashcards

regex that applies to the immediately preceding character(s) and indicates to match zero or one time.

Study These Flashcards

regex that applies to the immediately preceding regular expression and indicates to match zero or one time in “non-greedy mode”.

Study These Flashcards

regex that matches a single character as long as that character is in the specified set. In this example, it would match “a”, “e”, “i”, “o”, “u”, or “-“ but no other characters.

Study These Flashcards

[aeiou-] or [-aeiou]

You can specify ranges of characters using the minus sign. This example is a single character that must be a lowercase letter or a digit.

[a-z0-9]

When the first character in the set notation is a caret, it inverts the logic. This example matches a single character that is anything other than an uppercase or lowercase letter.

[^A-Za-z]

regex that asserts where a word begins or ends. This means that r'\bat\b' matches 'at', 'at.', '(at)', and 'as at ay' but not 'attempt' or 'atlas'.

regex that asserts where a word does NOT begin or end. This means that r'at\B' matches 'athens', 'atom', 'attorney', but not 'at', 'at.', or 'at!'.

regex that matches any decimal digit; equivalent to the set [0-9].

regex that matches any non-digit character; equivalent to the set [^0-9].

In Unix/Linux, command-line program similar to the search() function

Generalized Regular Expression Parser (grep) $ grep '^From:' mbox.short.txt

Unix linux regex for non-blank character

[^ ]

regex + and * characters expand outward to match the SMALLEST possible string

non-greedy matching

Specifies that exactly m copies of the previous regular expression should be matched; fewer matches cause the entire regular expression not to match

{m}

Causes the resulting regular expression to greedily match from m to n repetitions of the preceding regular expression.

{m,n}

Causes the resulting regular expression to non-greedily match from m to n repetitions of the preceding regular expression

{m,n}?

Creates a regular expression that will match either A or B. This operation is never greedy

| cat|dog = cat or dog

Matches the contents of the group of the same number. Groups are numbered starting from 1. For example, (.+) \1 matches 'the the' or '55 55', but not 'thethe' (note the space after the group). (abc)\\1 matches abcabc. In which, (abc) is a capturing group, and \\1 is a backreference that matches the same text as captured by the capturing group, so, \\1 matches abc too.

\number

Chapter 11: Regex Flashcards

(38 cards)