Chapter 7 – Pattern Matching with Regular Expressions Flashcards

1
Q

Let’s say we codes a function to check if a string was a phone number

A
def isPhoneNumber(text):
➊     if len(text) != 12:
           return False
       for i in range(0, 3):
➋         if not text[i].isdecimal():
               return False
➌     if text[3] != '-':
           return False
       for i in range(4, 7):
➍         if not text[i].isdecimal():
               return False
➎     if text[7] != '-':
           return False
       for i in range(8, 12):
➏         if not text[i].isdecimal():
               return False
➐     return True
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

And then we wanted to have it read any string to find phone numbers

A

message = ‘Call me at 415-555-1011 tomorrow. 415-555-9999 is my office.’
for i in range(len(message)):
➊ chunk = message[i:i+12]
➋ if isPhoneNumber(chunk):
print(‘Phone number found: ‘ + chunk)
print(‘Done’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Now let’s do that same function to search for phone numbers in one step
Using regular expression

A

\d\d\d-\d\d\d-\d\d\d\d

It’s called regex
It would look for 3 integers followed by a dash followed 3 more a dash as four more

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

{ } and regex

A

\d{3}-\d{3}-\d{4}

Is the same as the

\d\d\d-\d\d\d-\d\d\d\d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
What module is used for regular expressions 
Regex
A

Import re

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

re.compile()

Why would we use raw

A

> > > phoneNumRegex = re.compile(r’\d\d\d-\d\d\d-\d\d\d\d)
Tune a sting value into a regex

So we don’t have to escape every backslash which is common in regex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

\d

A

Digit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

search()

A

> > > phoneNumRegex = re.compile(r’\d\d\d-\d\d\d-\d\d\d\d’)
mo = phoneNumRegex.search(‘My number is 415-555-4242.’)
print(‘Phone number found: ‘ + mo.group())
Phone number found: 415-555-4242

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Search()

A

A Regex object’s search() method searches the string it is passed for any matches to the regex. The search() method will return None if the regex pattern is not found in the string. If the pattern is found, the search() method returns a Match object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does search() return

A

Match objects have a group() method that will return the actual matched text from the searched string.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Group()

Once you have searched for and found your match(Ed) object…you use

A

This method to takes out the group that is in the argument. In this case there is no argument so it takes out everything

> > > print(‘Phone number found: ‘ + mo.group())
Phone number found: 415-555-4242

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Grouping with parentheses

\d\d\d-\d\d\d-\d\d\d\d

A

(\d\d\d)-(\d\d\d-\d\d\d\d)

> phoneNumRegex = re.compile(r’(\d\d\d)-(\d\d\d-\d\d\d\d)’)
» mo = phoneNumRegex.search(‘My number is 415-555-4242.’)
» mo.group(1)
‘415’
» mo.group(2)
‘555-4242’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Group method for multiple groups

A

groups()
notice the PLURAL
Returns a tuple

>>> mo.groups()
('415', '555-4242')
>>> areaCode, mainNumber = mo.groups()
>>> print(areaCode)
415
>>> print(mainNumber)
555-4242
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What if your group regex has a parenthesis

A

You have to escape it (

> > > phoneNumRegex = re.compile(r’((\d\d\d)) (\d\d\d-\d\d\d\d)’)
mo = phoneNumRegex.search(‘My phone number is (415) 555-4242.’)
mo.group(1)
‘(415)’
mo.group(2)
‘555-4242’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a pipe

A

r’Batman|Tina Fey’
will match either ‘Batman’ or ‘Tina Fey’.

What if they are both in it?
Returns first occurrence


That is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Quote about regular expressions

A

“Knowing [regular expressions] can mean the difference between solving a problem in 3 steps and solving it in 3,000 steps. When you’re a nerd, you forget that the problems you solve with a couple keystrokes can take other people days of tedious, error-prone work to slog through.”[1]

17
Q

What if you wanted to search for a string with mtiplea of the same base
Like Spiderman apiderwoman and spidercar
String=”spidercar has lost a wheel”

A

Spiderreg=re.compile(r’spider(man|woman|car)’)

Mo=Spiderreg.search(String)

Mo.group()
Spidercar

18
Q

Search a part of a pipe group

A

> > > batRegex = re.compile(r’Bat(man|mobile|copter|bat)’)
mo = batRegex.search(‘Batmobile lost a wheel’)
mo.group()
‘Batmobile’
mo.group(1)
‘mobile’

19
Q

Basic pipe

A

> > > heroRegex = re.compile (r’Batman|Tina Fey’)
mo1 = heroRegex.search(‘Batman and Tina Fey.’)
mo1.group()
‘Batman’

20
Q

> > > batRegex = re.compile(r’Bat(man|mobile|copter|bat)’)
mo = batRegex.search(‘Batmobile lost a wheel’)
mo.group()
‘Batmobile’
mo.group(1)
‘mobile’

A

The method call mo.group() returns the full matched text ‘Batmobile’, while mo.group(1) returns just the part of the matched text inside the first parentheses group, ‘mobile’.

21
Q

What if there is an optional thing you want to look for

A

Use ?

Match zero or one of the group preceding this question mark.”

22
Q

Using ? To search for batman or (optionally) bat woman

A

batRegex = re.compile(r’Bat(wo)?man’)
»> mo1 = batRegex.search(‘The Adventures of Batman’)
»> mo1.group()
‘Batman’

> > > mo2 = batRegex.search(‘The Adventures of Batwoman’)
mo2.group()
‘Batwoman’

23
Q

What if you want to search for a phone number that is listed as with or without and area code

A

> > > phoneRegex = re.compile(r’(\d\d\d-)?\d\d\d-\d\d\d\d’)
mo1 = phoneRegex.search(‘My number is 415-555-4242’)
mo1.group()
‘415-555-4242’

> > > mo2 = phoneRegex.search(‘My number is 555-4242’)
mo2.group()
‘555-4242’