Chapter 7 – Pattern Matching with Regular Expressions Flashcards
Let’s say we codes a function to check if a string was a phone number
def isPhoneNumber(text): ➊ if len(text) != 12: return False for i in range(0, 3): ➋ if not text[i].isdecimal(): return False ➌ if text[3] != '-': return False for i in range(4, 7): ➍ if not text[i].isdecimal(): return False ➎ if text[7] != '-': return False for i in range(8, 12): ➏ if not text[i].isdecimal(): return False ➐ return True
And then we wanted to have it read any string to find phone numbers
message = ‘Call me at 415-555-1011 tomorrow. 415-555-9999 is my office.’
for i in range(len(message)):
➊ chunk = message[i:i+12]
➋ if isPhoneNumber(chunk):
print(‘Phone number found: ‘ + chunk)
print(‘Done’)
Now let’s do that same function to search for phone numbers in one step
Using regular expression
\d\d\d-\d\d\d-\d\d\d\d
It’s called regex
It would look for 3 integers followed by a dash followed 3 more a dash as four more
{ } and regex
\d{3}-\d{3}-\d{4}
Is the same as the
\d\d\d-\d\d\d-\d\d\d\d
What module is used for regular expressions Regex
Import re
re.compile()
Why would we use raw
> > > phoneNumRegex = re.compile(r’\d\d\d-\d\d\d-\d\d\d\d)
Tune a sting value into a regex
So we don’t have to escape every backslash which is common in regex
\d
Digit
search()
> > > phoneNumRegex = re.compile(r’\d\d\d-\d\d\d-\d\d\d\d’)
mo = phoneNumRegex.search(‘My number is 415-555-4242.’)
print(‘Phone number found: ‘ + mo.group())
Phone number found: 415-555-4242
Search()
A Regex object’s search() method searches the string it is passed for any matches to the regex. The search() method will return None if the regex pattern is not found in the string. If the pattern is found, the search() method returns a Match object
What does search() return
Match objects have a group() method that will return the actual matched text from the searched string.
Group()
Once you have searched for and found your match(Ed) object…you use
This method to takes out the group that is in the argument. In this case there is no argument so it takes out everything
> > > print(‘Phone number found: ‘ + mo.group())
Phone number found: 415-555-4242
Grouping with parentheses
\d\d\d-\d\d\d-\d\d\d\d
(\d\d\d)-(\d\d\d-\d\d\d\d)
> phoneNumRegex = re.compile(r’(\d\d\d)-(\d\d\d-\d\d\d\d)’)
» mo = phoneNumRegex.search(‘My number is 415-555-4242.’)
» mo.group(1)
‘415’
» mo.group(2)
‘555-4242’
Group method for multiple groups
groups()
notice the PLURAL
Returns a tuple
>>> mo.groups() ('415', '555-4242') >>> areaCode, mainNumber = mo.groups() >>> print(areaCode) 415 >>> print(mainNumber) 555-4242
What if your group regex has a parenthesis
You have to escape it (
> > > phoneNumRegex = re.compile(r’((\d\d\d)) (\d\d\d-\d\d\d\d)’)
mo = phoneNumRegex.search(‘My phone number is (415) 555-4242.’)
mo.group(1)
‘(415)’
mo.group(2)
‘555-4242’
What is a pipe
r’Batman|Tina Fey’
will match either ‘Batman’ or ‘Tina Fey’.
What if they are both in it?
Returns first occurrence
That is