Python - Regular Expressions Flashcards

Question 1

Q

Which of the following best describes what “\w” matches in a Python regular expression?

— 1 —
It matches any one character that is a space, tab, return, or newline.
— 2 —
It matches any one character that is a letter, digit, or underscore.
— 3 —
It matches any one character that is not the newline character.
— 4 —
It matches any one character that is not a lowercase w.

Answer

A

\w

— 2 —
It matches any one character that is a letter, digit, or underscore.

Question 2

Q

Which of the following regular expressions would not match with the string “123”?

--- 1 ---
[123]
--- 2 ---
\d\d\d
--- 3 ---
[0-9][0-9][0-9]
--- 4 ---
123

Answer

A

string “123”

— 1 —
[123]

Question 3

Q

In Python, how many backslashes do you need to express 2 backslashes with regex?

Answer

A

You need 4. The first backslash allows you to use the second, and the third backslash allows you to use the fourth.

////

Regular expressions use the backslash character (‘') to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals

Question 4

Q

What does the following regex mean?

. (Dot)

Answer

A

this matches any character EXCEPT a newline

Question 5

Q

What does the following regex mean?

^ (Caret)

Answer

A

Matches the start of the string

Question 6

Q

What does the following regex mean?

$

Answer

A

Matches the end of the string or just before the newline at the end of the string

Question 7

Q

What does the following regex mean?

*

Answer

A

Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible.

ab* will match ‘a’, ‘ab’, or ‘a’ followed by any number of ‘b’s

Question 8

Q

What does the following regex mean?

+

Answer

A

Causes the resulting RE to match 1 or more repetitions of the preceding RE, as many repetitions as are possible.

Question 9

Q

What does the following regex mean?

?

Answer

A

Causes the resulting RE to match 0 or 1 repetitions of the preceding RE.

ab? will match either ‘a’ or ‘ab’.

Question 10

Q

What does the following regex mean?

*?, +?, ??

Answer

A

The ‘*’, ‘+’, and ‘?’ qualifiers are all greedy; they match as much text as possible.

Sometimes this behaviour isn’t desired; if the RE is matched against ‘<a> b ‘, it will match the entire string, and not just ‘</a><a>’.</a>

Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched.

Using the RE will match only ‘</a><a>’.</a>

?? is lazy while ? is greedy.

Given (pattern)??, it will first test for empty string, then if the rest of the pattern can’t match, it will test for pattern.

In contrast, (pattern)? will test for pattern first, then it will test for empty string on backtrack.

The difference is in the order of searching:

“toys?2” searches for toys2, then toy2
“toys??2” searches for toy2, then toys2

But for the case of these 2 patterns, the result will be the same regardless of the input string, since the sequel 2 (after s? or s??) must be matched.</a>

Question 11

Q

What does the following regex mean?

{some_number}

{m}

Answer

A

Specifies that exactly m copies of the previous RE should be matched; fewer matches cause the entire RE not to match.

For example, a{6} will match exactly six ‘a’ characters, but not five.

Question 12

Q

What does the following regex mean?

{m,n}

Answer

A

Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as many repetitions as possible.

For example, a{3,5} will match from 3 to 5 ‘a’ characters.

Omitting m specifies a lower bound of zero, and omitting n specifies an infinite upper bound.

As an example, a{4,}b will match ‘aaaab’ or a thousand ‘a’ characters followed by a ‘b’, but not ‘aaab’.

The comma may not be omitted or the modifier would be confused with the previously described form.

Question 13

Q

What does the following regex mean?

{m,n}?

Answer

A

Causes the resulting RE to match from m to n repetitions of the preceding RE, attempting to match as FEW repetitions as possible.

This is the non-greedy version of the previous qualifier.

For example, on the 6-character string ‘aaaaaa’, a{3,5} will match 5 ‘a’ characters, while a{3,5}? will only match 3 characters.

Question 14

Q

What does the following regex mean?

\

Answer

A

Either escapes special characters (permitting you to match characters like ‘*’, ‘?’, and so forth), or signals a special sequence; special sequences are discussed below.

If you’re not using a raw string to express the pattern, remember that Python also uses the backslash as an escape sequence in string literals; if the escape sequence isn’t recognized by Python’s parser, the backslash and subsequent character are included in the resulting string. However, if Python would recognize the resulting sequence, the backslash should be repeated twice. This is complicated and hard to understand, so it’s highly recommended that you use raw strings for all but the simplest expressions.

Question 15

Q

What does the following regex mean?

[ ]

Answer

A

Used to indicate a set of characters. In a set:

Characters can be listed individually, e.g. [amk] will match ‘a’, ‘m’, or ‘k’.
Ranges of characters can be indicated by giving two characters and separating them by a ‘-‘, for example [a-z] will match any lowercase ASCII letter, [0-5][0-9] will match all the two-digits numbers from 00 to 59, and [0-9A-Fa-f] will match any hexadecimal digit. If - is escaped (e.g. [a-z]) or if it’s placed as the first or last character (e.g. [-a] or [a-]), it will match a literal ‘-‘.
Special characters lose their special meaning inside sets. For example, [(+)] will match any of the literal characters ‘(‘, ‘+’, ‘’, or ‘)’.
Character classes such as \w or \S (defined below) are also accepted inside a set, although the characters they match depends on whether ASCII or LOCALE mode is in force.
Characters that are not within a range can be matched by complementing the set. If the first character of the set is ‘^’, all the characters that are not in the set will be matched. For example, [^5] will match any character except ‘5’, and [^^] will match any character except ‘^’. ^ has no special meaning if it’s not the first character in the set.
To match a literal ‘]’ inside a set, precede it with a backslash, or place it at the beginning of the set. For example, both [()[]{}] and [{}] will both match a parenthesis.

Question 16

Q

What does the following regex mean?

|

Answer

A

A|B, where A and B can be arbitrary REs, creates a regular expression that will match either A or B.

An arbitrary number of REs can be separated by the ‘|’ in this way. This can be used inside groups (see below) as well.

As the target string is scanned, REs separated by ‘|’ are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once A matches, B will not be tested further, even if it would produce a longer overall match. In other words, the ‘|’ operator is never greedy.

To match a literal ‘|’, use |, or enclose it inside a character class, as in [|].

Question 17

Q

What does the following regex mean?

…

Answer

A

Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group.

The contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the \number special sequence, described below.

To match the literals ‘(‘ or ‘)’, use ( or ), or enclose them inside a character class: [(], [)].

Question 18

Q

What does the following regex mean?

?=…

Answer

A

Matches if … matches next, but doesn’t consume any of the string.

This is called a lookahead assertion.

For example, Isaac (?=Asimov) will match ‘Isaac ‘ only if it’s followed by ‘Asimov’.

Question 19

Q

What does the following regex mean?

?!…

Answer

A

Matches if … doesn’t match next.

This is a negative lookahead assertion.

For example, Isaac (?!Asimov) will match ‘Isaac ‘ only if it’s not followed by ‘Asimov’.

Question 20

Q

# This program tries to use a regular expression
# to identify all words in the sequence below that
# contain the character sequence 'an'. See if you
# can fix it!

word_sequence = ('apple', 'banana', 'orange')
for word in word_sequence:
    match = re.search(r'an', word)
    if match is not None:
        print(word)

Answer

A

import re

word_sequence = ('apple', 'banana', 'orange')
for word in word_sequence:
    match = re.search(r'an', word)
    if match is not None:
        print(word)

Question 21

Q

# This program tries to use a regular expression
# to identify all words that start with the letter
# 'a'. Instead, it identifies all three words in
# the sequence below as words that contain the letter
# 'a' somewhere. See if you can change it so it only
# finds words that start with 'a'!

import re

word_sequence = ('apple', 'banana', 'orange')
for word in word_sequence:
    match = re.search(r'a.*', word)
    if match is not None:
        print(word)

Answer

A

import re

word_sequence = ('apple', 'banana', 'orange')
for word in word_sequence:
    match = re.search(r'^a.*', word)
    if match is not None:
        print(word)

Question 22

Q

Write a program that outputs only the words in /usr/share/dict/words that start with the letters “ply”. It should output the words in order, each on their own line.

Answer

A

import re

with open(‘/usr/share/dict/words’) as f:

for word in f:
match = re.search(r’^ply’, word)
if match is not None:
print(word)

Question 23

Q

Write a program that outputs only the words in /usr/share/dict/words that have exactly five characters and start and end with a lowercase “o”. It should output the words in order, each on their own line.

Answer

A

import re

with open(‘/usr/share/dict/words’, ‘r’) as f:
for word in f:
match = re.search(r’^o.{3}o$’, word)
if match is not None:
print(word)

Question 24

Q

What does the following regex mean?

\w

Answer

A

matches upper and lower case letters, digits, and underscore

Question 25

Q

What does the following regex mean?

\s

Answer

A

matches blank space (space, tab, newline, carriage return)

Question 26

Q

What does the following regex mean?

\d

Answer

A

matches any digit 0-9

Question 27

Q

Suppose you have the following regular expression:

^ap.+l.$
Which of the words below would that expression match?

apple
aplenty
pal
pals
plenty

Answer

A

apple
pal
pals

Question 28

Q

Which of the following regular expressions matches only the first four of these words?

bowls
books
bogus
bossy
bistro
bison
bakery
obligates
\_\_\_\_\_\_\_\_\_\_

^bo+s

^bo+.*s

^b.[os]+.

^bos+.

Question 29

Q

What does the following regex mean?

\W

Answer

A

matches NOT a word char (NOT a letter, digit, or underscore)

Question 30

Q

# This program tries to find only six-letter words in
# the list of words declared below. See if you can fix 
# it!

import re

regex = re.compile(r'.{6}$')
words = ('apple', 'banana', 'orange', 'grapefruit')

for word in words:
if regex.search(word):
print(word)

Answer

A

import re

regex = re.compile(r'^.{6}$')
words = ('apple', 'banana', 'orange', 'grapefruit')

for word in words:
if regex.search(word):
print(word)

Question 31

Q

# This program tries to identify which of the strings
# in a sequence has multiple instances of the letter
# 'e' in a row. It then tries to print just those 
# repetitions of the letter 'e'. See if you can fix
# it!

import re

strings = (‘knee’, ‘meet’, ‘eeeeeeee’, ‘set’)

# Hint: How do you make the set of instances of the
# letter 'e' a group?
regex = re.compile(r'e{2,}')
for string in strings:
    result = regex.search(string)
    if result is not None:
        print(result.group(1))

Answer

A

import re

strings = (‘knee’, ‘meet’, ‘eeeeeeee’, ‘set’)

# Hint: How do you make the set of instances of the
# letter 'e' a group?
regex = re.compile(r'(e{2,})')
for string in strings:
    result = regex.search(string)
    if result is not None:
        print(result.group(1))

Question 32

Q

Write a program that outputs only the words in /usr/share/dict/words that start with the letters “ply” and have at most six characters. It should output the words in order, each on their own line.

Answer

A

import re

with open(‘/usr/share/dict/words’, ‘r’) as f:
for word in f:
result = re.search(r’^ply.{0,3}$’, word)
if result is not None:
print(word)

Question 33

Q

Write a program that outputs only the words in /usr/share/dict/words that start with a lowercase u, have at least five instances of a lowercase u, and end in a lowercase s. It should output the words in order, each on their own line.

Answer

A

import re

with open(‘/usr/share/dict/words’, ‘r’) as f:
for word in f:
result = re.search(r’^u.u.u.u.u.*s$’, word)
if result is not None:
print(word)

Question 34

Q

What does the search() method return?

Answer

A

It returns a match object which is either None or not None, which either matches the pattern or not
(finds only FIRST occurrence of the RE in the string)

You can either do:

1)  
pattern = re.compile(r'my_reg_expression')
result = pattern.search(some_string)
if result is not None:
      print(word)

2)
result = re.search(r’my_reg_expression’, some_string)
if result is not None:
print(word)

Question 35

Q

What does compile() method return?

Answer

A

It returns a compiled version of the regular expression
(saves processing time for later)

You can do:

pattern = re.compile(r’my_reg_expression’)
result = pattern.search(some_string)
if result is not None:
print(word)

Question 36

Q

What does the match() method return?

Answer

A

Determine if the RE matches at the beginning of the string.

return None if no match can be found. If they’re successful, a match object instance is returned, containing information about the match: where it starts and ends, the substring it matched, and more.

Question 37

Q

What does the findall() method return?

Answer

A

Find all substrings where the RE matches, and returns them as a list.

Question 38

Q

What does the finditer() method return?

Answer

A

Find all substrings where the RE matches, and returns them as an iterator.

Question 39

Q

Write a simple program to display the number of times the regex has matched:

import re

regex = re.compile(r”your pattern here…*”)
…
…

Answer

A

import re

regex = re.compile(r”your pattern here…*”)

match = regex.findall(“the contents…”)

len(match)

Question 40

Q

What does the following regex do?

\b

Answer

A

Matches the word boundary

Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters.

This means that r’\bfoo\b’ matches ‘foo’, ‘foo.’, ‘(foo)’, ‘bar foo baz’ but not ‘foobar’ or ‘foo3’.

Question 41

Q

Write the regular expression that matches the user input below:

user_word = input('Enter word: ')
pattern = re.compile( ...your answer here... )

Your search must be case-insensitive. (This means if you are looking up the word “apple”, then “apple” and “Apple” both count.)

Your search must not count words that contain your word. (This means if you are looking up the word “apple”, then “applesauce” does not count.)

Note that this also includes possessives! So if you are looking for the word “Jonathan”, then “Jonathan’s” does not count. If you are looking for “Jonathan’s”, on the other hand, “Jonathan’s” should count, but not “Jonathan” or “Jonathans”.

Your search must count instances of a word that have trailing punctuation marks. (This means if you are looking up the word “apple” and it occurs at the end of a sentence as “apple.”, that occurrence should still count.)

Answer

A

r’\b[’

+ user_word[0].lower()

+ user_word[0].upper()

+ r’]’

+ user_word[1:]

+ r’( [^A-Za-z0-9-'] | \s | $ )’

Question 42

Q

What does the following regex do?

r’ (a | b) ‘

Answer

A

matches the character a or the character b

notice the use of the OR regex ‘|’ needs to be grouped together with parentheses