regex (re python module) Flashcards

1
Q

What’s the difference between re.match(pattern, string) and re.search(pattern, string)?

A
  • re.match checks if the pattern matches at the start of the string.
  • re.search looks anywhere in the string for the first match.

Example:
Imagine we want to check if a lead’s message starts with a keyword like “Interested”:

import re
message = “Interested in your product, call me at 123-456-7890.”
print(re.match(r’Interested’, message)) # Match object -> matches ‘Interested’
print(re.search(r’\d{3}-\d{3}-\d{4}’, message)) # Match object -> matches ‘123-456-7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which function returns all non-overlapping matches of a pattern in a string?

A

re.findall(pattern, string) returns a list of all matches.

Example:
Extract all the phone numbers from a lead’s notes field:

import re
notes = “Call me at 123-456-7890 or at 987-654-3210.”
phone_numbers = re.findall(r’\d{3}-\d{3}-\d{4}’, notes)
print(phone_numbers) # [‘123-456-7890’, ‘987-654-3210’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which re function do you use to replace all occurrences of a pattern in a string?

A

re.sub(pattern, repl, string).

Example:
Normalize phone numbers by removing dashes for storage:

import re
phone = “123-456-7890”
normalized_phone = re.sub(r’-‘, ‘’, phone)
print(normalized_phone) # ‘1234567890’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can you split a string by a given regex pattern?

A

Use re.split(pattern, string) to split the string into a list at every match of the pattern.

Example:
Split a lead’s full name into first and last names:

import re
full_name = “John Doe”
name_parts = re.split(r’\s+’, full_name)
print(name_parts) # [‘John’, ‘Doe’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What’s the benefit of re.compile(pattern)?

A

It compiles the regex into a reusable RegexObject, which can provide performance gains if you use the same pattern multiple times.

Example:
Check multiple emails for validity during batch lead validation:

import re
pattern = re.compile(r’^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}$’)
emails = [“valid@example.com”, “invalid@domain”, “user@website.org”]
valid_emails = [email for email in emails if pattern.match(email)]
print(valid_emails) # [‘valid@example.com’, ‘user@website.org’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Name a few regex meta-characters and their meanings. Like ^ or $.

A

Examples:

^ start of string
$ end of string
. any character (except newline)
[] character class
| OR
() grouping
+ one or more
* zero or more
? zero or one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does \d, \w, and \s represent in regex?

A

\d matches any digit (0-9)
\w matches any word character (A-Za-z0-9_)
\s matches whitespace (spaces, tabs, newlines)

Example:
Check if a lead’s username contains only valid characters:

import re
username = “user_123”
print(re.match(r’^\w+$’, username)) # Match object -> matches ‘user_123’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do the following quantifiers differ: +, *, ?, {m,n}?

A

+ = 1 or more
* = 0 or more
? = 0 or 1 (optional)
{m,n} = between m and n times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which anchors assert the position at the start or end of the string?

A

^ = start of string
$ = end of string

Example:
Check if a lead’s message starts with “Hello” and ends with “Thanks”:

import re
message = “Hello, I’m interested. Thanks”
print(re.match(r’^Hello.*Thanks$’, message)) # Match object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does \b (word boundary) do in regex?

A

\b asserts the position where one side is a word character and the other side is a non-word character (or start/end of the string).

Example:
Check if a lead used the word “free” (not part of another word like “freedom”):

import re
message = “Get your free trial today!”
print(re.search(r’\bfree\b’, message)) # Match object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you make a group non-capturing in a regex?

A

Use (?: … ) instead of ( … ).

Example:
Match the phrases “buy now” or “shop now” in a lead’s response:

import re
message = “I would like to shop now.”
print(re.search(r’(?:buy|shop) now’, message)) # Match object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Validating Email (Simple)

A

^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}$

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Provide a basic regex pattern that captures US phone numbers (like +1 123 456-7890) with optional +1, parentheses, dashes, etc.

A

^(?:+?1\s*)?(?:(\d{3})|\d{3})[\s-.]?\d{3}[\s-.]?\d{4}$

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why do we often use a raw string (e.g., r’…’) for regex in Python?

A

It prevents Python’s backslash escapes from interfering, so you don’t need to escape them twice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How would you remove all digits from a string using re.sub?

A

re.sub(r’\d+’, ‘’, your_string)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can you make your regex more readable using verbose mode?

A

Use re.X or re.VERBOSE flag and split the pattern across multiple lines with comments, e.g.:

pattern = r”””

[A-Za-z0-9._%+-]+
@
[A-Za-z0-9.-]+
.[A-Za-z]{2,}
$
“””
compiled = re.compile(pattern, re.VERBOSE)