regex Flashcards
Digit
regex
2 alternatives
\d
a or b
regex
(a|b)
Group
regex
(...)
Range (a or b or c)
regex
[abc]
Not (a or b or c)
[^abc]
Lower case letter from a to q
[a-q]
Upper case letter from A to Q
[A-Q]
Digit from 0 to 7
[0-7]
Look-ahead assertions
?=
Negative look-ahead
?!
Look-behind assertion
?<=
Negative look-behind
?!=
?<!
Any character except new line
.
0 or more
*
1 or more
+
Exactly 3
{3}
3 or more
{3,}
3, 4, or 5
{3-5}
Not digit
\D
Word
\w
Not word
\W
White space
\s
Not white space
\S
Start of string or line
Start of string
\A
End of string or line
$
End of string
\Z
Word boundary
\b
Not word boundary
\B
Octal character xxx
\xxx
Hex character hh
\xhh
Use regular expressions
import re
matches = re.findall(r’\b\d+\b’, text)
Cleaning adresses
import re
def clean_address(address):
pattern = r’\d+\s+(?:[A-Z][a-z]+\s+)+’ # Match street address part
match = re.search(pattern, address)
if match:
return match.group(0).title() # Return it capitalized
else:
return address # Leave it as is if no match