7 Regex Flashcards

1
Q

How can you add a regex object to a variable?

A

variable = re.compile(r’regexPattern)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a match object?

A

A match object is returned by the Regex object search() method if you pass it a string (or sth else).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name the 4 steps of Regular Expression Matching

A
  1. Import re
  2. Create Regex object
    »> regObj = re.compile(pattern)
  3. Pass message to Regex object search method
    mo = regObj.search(‘message’)
  4. print(mo.group())
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do the numbers in the group methods parentheses indicate?

mo.group()
mo.group(0)
mo.group(1)
mo.group(2)

A

If the regex pattern has several groups, e.g.:
(r’(\d\d\d)-(\d\d\d-\d\d\d\d)’)
the numbers indicate which group is searched for.

mo.group() & mo.group(0) returns the entire text

mo. group(1) returns (\d\d\d)
mo. group(2) returns (\d\d\d-\d\d\d\d)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does mo.groups() do?

What can you do with it because its return data type?

A

Returns tuple of all groups:
»> mo.groups()
(‘415’, ‘555-4242’)

You can do the multiple assignment trick:
»> areaCode, mainNumber = mo.groups()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Special characters:
| (pipe)

What is to consider about it?

A

pipe:
to match one of many expressions, e.g. ‘Batman’ or ‘Batmobile’ with&raquo_space;>(r’Bat(man|mobile)’)

When both ‘man’ and ‘mobile’ occur, the first match is taken

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Special characters:
?
*
+

A

They allow optional matching:
? == 0 or 1 time
‘Batman’ or ‘Batwoman’
»>r’Bat(wo)?man’

  • == 0 or more times
    ‘Batman’ or ‘Batwowowoman’
    »>r’Bat(wo)*man’

No longer optional:
+ == 1 or more times
‘Batwoman’ or ‘Batwowowoman’
»>r’Bat(wo)+man’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Special characters:

{}

A

Match specific repetition patterns:
r’(Ha){3}’ -> HaHaHa

r’(Ha){3, 5}’ -> HaHaHa | HaHaHaHa | HaHaHaHaHa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is greedy and non-greedy (lazy) matching

How can you switch to lazy matching?

A

By default, Python does greedy matching, searching for the longest possible match
r’(Ha){3, 5}’ -> searches for HaHaHaHaHa

non-greedy: takes the first found match
r’(Ha){3, 5}?’ -> searches for HaHaHa

Here the ? means non-greedy (not optional as when its used combined with (groups))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

findall() method

  1. Difference towards search()
      1. return values
A
  1. Finds all matches, not only the first
  2. Does not return a match object, but a list of strings
    »>[‘415-555-9999’, ‘212-555-0000’]
  3. Except if there are multiple groups in the Regex Objects, e.g.:
    »> re.compile(r’(\d\d\d)-(\d\d\d)-(\d\d\d\d)’)
    Than, the return value is a list of tuples, e.g.:
    »> [(‘415’, ‘555’, ‘9999’), (‘212’, ‘555’, ‘0000’)]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Shorthand character class

\d
\D

\w
\W

\s
\S

A

\d any digit 0-9
\D any character NOT a digit

\w any letter, numeric digit or undescore character
\W any character NOT…

\s any space, tab or newline
\S any character NOT…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can you define your own character class. E.g.
for vowels?
for letters?
for ‘.+*’

A

By using [square brackets]

> > > vowelRegex = re.compile(r’[aeiouAEIOU]’)

> > > letterRegex = re.compile(r’[a-zA-Z]’)

> > > specCharRegex = re.compile(r’[.+*]’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can you make a negative character class

What do they do

A

By adding ^ after the first square bracket

[^aeiouAEIOU]

Match everything except whats in the brackets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What do the Caret and Dollar Sign do?

Whats the mnemonic to remmember what comes first?

A

Caret
»>(r’^Hello!’)
pattern must start with ‘Hello!’

Dollar:
»>(r’^Hello!’)
pattern must end with …

Carrots cost Dollars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does it search for?

|&raquo_space;>re.compile( r’^\d+$’)

A

For a pattern that starts and ends with one ore more digits

-> and that has nothing else in between

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Whats the wildcard character?

A

It´s the ‘.’ and can be any character except \n, e.g.:

> > > atRegex = re.compile(r’.at’)

Could be: ‘Hat’, ‘Sat’, ‘fat’…

17
Q

When could that be useful?
1.
»> everythRegex = re.compile(r’.*’)

2.
»> everythRegex = re.compile(r’.*’?)

A

If you want any combination of 0 or more characters that is:

  1. as long as possible (greedy)
  2. short as possible (lazy)
18
Q

How can Case-Insensitive-Matching be done?

A

re. compile(r’pAtTeRn’, re.I’)

re. I stands for re.IgnoreCase

19
Q

How can you substitute?

Agent example

A

The sub() method for Regex objects is passed two arguments. The first argument is a string to replace any matches. The second is the string for the regular expression.

> > > namesRegex = re.compile(r’Agent \w+’)
namesRegex.sub(‘CENSORED’, ‘Agent Alice gave the secret documents to Agent Bob.’)
‘CENSORED gave the secret documents to CENSORED.

20
Q

What is re.Verbose useful for?

What code is required

A

It hightens readability and allows to write in different lines:

re.compile(r’’’(


…)’’’, re.VERBOSE)

21
Q

How can you combine ignorecase, dotall and verbose?

A

> > > someRegexValue = re.compile(‘foo’, re.IGNORECASE | re.DOTALL | re.VERBOSE)