7 Regex Flashcards

Question 1

Q

How can you add a regex object to a variable?

Answer

A

variable = re.compile(r’regexPattern)

Question 2

Q

What is a match object?

Answer

A

A match object is returned by the Regex object search() method if you pass it a string (or sth else).

Question 3

Q

Name the 4 steps of Regular Expression Matching

Answer

A

Import re
Create Regex object
»> regObj = re.compile(pattern)
Pass message to Regex object search method
mo = regObj.search(‘message’)
print(mo.group())

Question 4

Q

What do the numbers in the group methods parentheses indicate?

mo.group()
mo.group(0)
mo.group(1)
mo.group(2)
…

Answer

A

If the regex pattern has several groups, e.g.:
(r’(\d\d\d)-(\d\d\d-\d\d\d\d)’)
the numbers indicate which group is searched for.

mo.group() & mo.group(0) returns the entire text

mo. group(1) returns (\d\d\d)
mo. group(2) returns (\d\d\d-\d\d\d\d)

Question 5

Q

What does mo.groups() do?

What can you do with it because its return data type?

Answer

A

Returns tuple of all groups:
»> mo.groups()
(‘415’, ‘555-4242’)

You can do the multiple assignment trick:
»> areaCode, mainNumber = mo.groups()

Question 6

Q

Special characters:
| (pipe)

What is to consider about it?

Answer

A

pipe:
to match one of many expressions, e.g. ‘Batman’ or ‘Batmobile’ with&raquo_space;>(r’Bat(man|mobile)’)

When both ‘man’ and ‘mobile’ occur, the first match is taken

Question 7

Q

Special characters:
?
*
+

Answer

A

They allow optional matching:
? == 0 or 1 time
‘Batman’ or ‘Batwoman’
»>r’Bat(wo)?man’

== 0 or more times
‘Batman’ or ‘Batwowowoman’
»>r’Bat(wo)*man’

No longer optional:
+ == 1 or more times
‘Batwoman’ or ‘Batwowowoman’
»>r’Bat(wo)+man’

Question 8

Q

Special characters:

{}

Answer

A

Match specific repetition patterns:
r’(Ha){3}’ -> HaHaHa

r’(Ha){3, 5}’ -> HaHaHa | HaHaHaHa | HaHaHaHaHa

Question 9

Q

What is greedy and non-greedy (lazy) matching

How can you switch to lazy matching?

Answer

A

By default, Python does greedy matching, searching for the longest possible match
r’(Ha){3, 5}’ -> searches for HaHaHaHaHa

non-greedy: takes the first found match
r’(Ha){3, 5}?’ -> searches for HaHaHa

Here the ? means non-greedy (not optional as when its used combined with (groups))

Question 10

Q

findall() method

Difference towards search()
- 1. return values

Answer

A

Finds all matches, not only the first
Does not return a match object, but a list of strings
»>[‘415-555-9999’, ‘212-555-0000’]
Except if there are multiple groups in the Regex Objects, e.g.:
»> re.compile(r’(\d\d\d)-(\d\d\d)-(\d\d\d\d)’)
Than, the return value is a list of tuples, e.g.:
»> [(‘415’, ‘555’, ‘9999’), (‘212’, ‘555’, ‘0000’)]

Question 11

Q

Shorthand character class

\d
\D

\w
\W

\s
\S

Answer

A

\d any digit 0-9
\D any character NOT a digit

\w any letter, numeric digit or undescore character
\W any character NOT…

\s any space, tab or newline
\S any character NOT…

Question 12

Q

How can you define your own character class. E.g.
for vowels?
for letters?
for ‘.+*’

Answer

A

By using [square brackets]

> > > vowelRegex = re.compile(r’[aeiouAEIOU]’)

> > > letterRegex = re.compile(r’[a-zA-Z]’)

> > > specCharRegex = re.compile(r’[.+*]’)

Question 13

Q

How can you make a negative character class

What do they do

Answer

A

By adding ^ after the first square bracket

[^aeiouAEIOU]

Match everything except whats in the brackets

Question 14

Q

What do the Caret and Dollar Sign do?

Whats the mnemonic to remmember what comes first?

Answer

A

Caret
»>(r’^Hello!’)
pattern must start with ‘Hello!’

Dollar:
»>(r’^Hello!’)
pattern must end with …

Carrots cost Dollars

Question 15

Q

What does it search for?

|&raquo_space;>re.compile( r’^\d+$’)

Answer

A

For a pattern that starts and ends with one ore more digits

-> and that has nothing else in between

Question 16

Q

Whats the wildcard character?

Answer

Study These Flashcards

A

It´s the ‘.’ and can be any character except \n, e.g.:

> > > atRegex = re.compile(r’.at’)

Could be: ‘Hat’, ‘Sat’, ‘fat’…

Question 17

Q

When could that be useful?
1.
»> everythRegex = re.compile(r’.*’)

2.
»> everythRegex = re.compile(r’.*’?)

Answer

Study These Flashcards

A

If you want any combination of 0 or more characters that is:

as long as possible (greedy)
short as possible (lazy)

Question 18

Q

How can Case-Insensitive-Matching be done?

Answer

Study These Flashcards

A

re. compile(r’pAtTeRn’, re.I’)

re. I stands for re.IgnoreCase

Question 19

Q

How can you substitute?

Agent example

Answer

Study These Flashcards

A

The sub() method for Regex objects is passed two arguments. The first argument is a string to replace any matches. The second is the string for the regular expression.

> > > namesRegex = re.compile(r’Agent \w+’)
namesRegex.sub(‘CENSORED’, ‘Agent Alice gave the secret documents to Agent Bob.’)
‘CENSORED gave the secret documents to CENSORED.