7 Regex Flashcards
How can you add a regex object to a variable?
variable = re.compile(r’regexPattern)
What is a match object?
A match object is returned by the Regex object search() method if you pass it a string (or sth else).
Name the 4 steps of Regular Expression Matching
- Import re
- Create Regex object
»> regObj = re.compile(pattern) - Pass message to Regex object search method
mo = regObj.search(‘message’) - print(mo.group())
What do the numbers in the group methods parentheses indicate?
mo.group()
mo.group(0)
mo.group(1)
mo.group(2)
…
If the regex pattern has several groups, e.g.:
(r’(\d\d\d)-(\d\d\d-\d\d\d\d)’)
the numbers indicate which group is searched for.
mo.group() & mo.group(0) returns the entire text
mo. group(1) returns (\d\d\d)
mo. group(2) returns (\d\d\d-\d\d\d\d)
What does mo.groups() do?
What can you do with it because its return data type?
Returns tuple of all groups:
»> mo.groups()
(‘415’, ‘555-4242’)
You can do the multiple assignment trick:
»> areaCode, mainNumber = mo.groups()
Special characters:
| (pipe)
What is to consider about it?
pipe:
to match one of many expressions, e.g. ‘Batman’ or ‘Batmobile’ with»_space;>(r’Bat(man|mobile)’)
When both ‘man’ and ‘mobile’ occur, the first match is taken
Special characters:
?
*
+
They allow optional matching:
? == 0 or 1 time
‘Batman’ or ‘Batwoman’
»>r’Bat(wo)?man’
- == 0 or more times
‘Batman’ or ‘Batwowowoman’
»>r’Bat(wo)*man’
No longer optional:
+ == 1 or more times
‘Batwoman’ or ‘Batwowowoman’
»>r’Bat(wo)+man’
Special characters:
{}
Match specific repetition patterns:
r’(Ha){3}’ -> HaHaHa
r’(Ha){3, 5}’ -> HaHaHa | HaHaHaHa | HaHaHaHaHa
What is greedy and non-greedy (lazy) matching
How can you switch to lazy matching?
By default, Python does greedy matching, searching for the longest possible match
r’(Ha){3, 5}’ -> searches for HaHaHaHaHa
non-greedy: takes the first found match
r’(Ha){3, 5}?’ -> searches for HaHaHa
Here the ? means non-greedy (not optional as when its used combined with (groups))
findall() method
- Difference towards search()
- return values
- Finds all matches, not only the first
- Does not return a match object, but a list of strings
»>[‘415-555-9999’, ‘212-555-0000’] - Except if there are multiple groups in the Regex Objects, e.g.:
»> re.compile(r’(\d\d\d)-(\d\d\d)-(\d\d\d\d)’)
Than, the return value is a list of tuples, e.g.:
»> [(‘415’, ‘555’, ‘9999’), (‘212’, ‘555’, ‘0000’)]
Shorthand character class
\d
\D
\w
\W
\s
\S
\d any digit 0-9
\D any character NOT a digit
\w any letter, numeric digit or undescore character
\W any character NOT…
\s any space, tab or newline
\S any character NOT…
How can you define your own character class. E.g.
for vowels?
for letters?
for ‘.+*’
By using [square brackets]
> > > vowelRegex = re.compile(r’[aeiouAEIOU]’)
> > > letterRegex = re.compile(r’[a-zA-Z]’)
> > > specCharRegex = re.compile(r’[.+*]’)
How can you make a negative character class
What do they do
By adding ^ after the first square bracket
[^aeiouAEIOU]
Match everything except whats in the brackets
What do the Caret and Dollar Sign do?
Whats the mnemonic to remmember what comes first?
Caret
»>(r’^Hello!’)
pattern must start with ‘Hello!’
Dollar:
»>(r’^Hello!’)
pattern must end with …
Carrots cost Dollars
What does it search for?
|»_space;>re.compile( r’^\d+$’)
For a pattern that starts and ends with one ore more digits
-> and that has nothing else in between