Chapter 7 – Pattern Matching with Regular Expressions Flashcards
Let’s say we codes a function to check if a string was a phone number
def isPhoneNumber(text): ➊ if len(text) != 12: return False for i in range(0, 3): ➋ if not text[i].isdecimal(): return False ➌ if text[3] != '-': return False for i in range(4, 7): ➍ if not text[i].isdecimal(): return False ➎ if text[7] != '-': return False for i in range(8, 12): ➏ if not text[i].isdecimal(): return False ➐ return True
And then we wanted to have it read any string to find phone numbers
message = ‘Call me at 415-555-1011 tomorrow. 415-555-9999 is my office.’
for i in range(len(message)):
➊ chunk = message[i:i+12]
➋ if isPhoneNumber(chunk):
print(‘Phone number found: ‘ + chunk)
print(‘Done’)
Now let’s do that same function to search for phone numbers in one step
Using regular expression
\d\d\d-\d\d\d-\d\d\d\d
It’s called regex
It would look for 3 integers followed by a dash followed 3 more a dash as four more
{ } and regex
\d{3}-\d{3}-\d{4}
Is the same as the
\d\d\d-\d\d\d-\d\d\d\d
What module is used for regular expressions Regex
Import re
re.compile()
Why would we use raw
> > > phoneNumRegex = re.compile(r’\d\d\d-\d\d\d-\d\d\d\d)
Tune a sting value into a regex
So we don’t have to escape every backslash which is common in regex
\d
Digit
search()
> > > phoneNumRegex = re.compile(r’\d\d\d-\d\d\d-\d\d\d\d’)
mo = phoneNumRegex.search(‘My number is 415-555-4242.’)
print(‘Phone number found: ‘ + mo.group())
Phone number found: 415-555-4242
Search()
A Regex object’s search() method searches the string it is passed for any matches to the regex. The search() method will return None if the regex pattern is not found in the string. If the pattern is found, the search() method returns a Match object
What does search() return
Match objects have a group() method that will return the actual matched text from the searched string.
Group()
Once you have searched for and found your match(Ed) object…you use
This method to takes out the group that is in the argument. In this case there is no argument so it takes out everything
> > > print(‘Phone number found: ‘ + mo.group())
Phone number found: 415-555-4242
Grouping with parentheses
\d\d\d-\d\d\d-\d\d\d\d
(\d\d\d)-(\d\d\d-\d\d\d\d)
> phoneNumRegex = re.compile(r’(\d\d\d)-(\d\d\d-\d\d\d\d)’)
» mo = phoneNumRegex.search(‘My number is 415-555-4242.’)
» mo.group(1)
‘415’
» mo.group(2)
‘555-4242’
Group method for multiple groups
groups()
notice the PLURAL
Returns a tuple
>>> mo.groups() ('415', '555-4242') >>> areaCode, mainNumber = mo.groups() >>> print(areaCode) 415 >>> print(mainNumber) 555-4242
What if your group regex has a parenthesis
You have to escape it (
> > > phoneNumRegex = re.compile(r’((\d\d\d)) (\d\d\d-\d\d\d\d)’)
mo = phoneNumRegex.search(‘My phone number is (415) 555-4242.’)
mo.group(1)
‘(415)’
mo.group(2)
‘555-4242’
What is a pipe
r’Batman|Tina Fey’
will match either ‘Batman’ or ‘Tina Fey’.
What if they are both in it?
Returns first occurrence
That is
Quote about regular expressions
“Knowing [regular expressions] can mean the difference between solving a problem in 3 steps and solving it in 3,000 steps. When you’re a nerd, you forget that the problems you solve with a couple keystrokes can take other people days of tedious, error-prone work to slog through.”[1]
What if you wanted to search for a string with mtiplea of the same base
Like Spiderman apiderwoman and spidercar
String=”spidercar has lost a wheel”
Spiderreg=re.compile(r’spider(man|woman|car)’)
Mo=Spiderreg.search(String)
Mo.group()
Spidercar
Search a part of a pipe group
> > > batRegex = re.compile(r’Bat(man|mobile|copter|bat)’)
mo = batRegex.search(‘Batmobile lost a wheel’)
mo.group()
‘Batmobile’
mo.group(1)
‘mobile’
Basic pipe
> > > heroRegex = re.compile (r’Batman|Tina Fey’)
mo1 = heroRegex.search(‘Batman and Tina Fey.’)
mo1.group()
‘Batman’
> > > batRegex = re.compile(r’Bat(man|mobile|copter|bat)’)
mo = batRegex.search(‘Batmobile lost a wheel’)
mo.group()
‘Batmobile’
mo.group(1)
‘mobile’
The method call mo.group() returns the full matched text ‘Batmobile’, while mo.group(1) returns just the part of the matched text inside the first parentheses group, ‘mobile’.
What if there is an optional thing you want to look for
Use ?
Match zero or one of the group preceding this question mark.”
Using ? To search for batman or (optionally) bat woman
batRegex = re.compile(r’Bat(wo)?man’)
»> mo1 = batRegex.search(‘The Adventures of Batman’)
»> mo1.group()
‘Batman’
> > > mo2 = batRegex.search(‘The Adventures of Batwoman’)
mo2.group()
‘Batwoman’
What if you want to search for a phone number that is listed as with or without and area code
> > > phoneRegex = re.compile(r’(\d\d\d-)?\d\d\d-\d\d\d\d’)
mo1 = phoneRegex.search(‘My number is 415-555-4242’)
mo1.group()
‘415-555-4242’
> > > mo2 = phoneRegex.search(‘My number is 555-4242’)
mo2.group()
‘555-4242’