Regular Expressions in Python Flashcards
r’st\d\s\w\n{3,10}’
r: raw
st: string
d: digit
s: white space
S:non-white space
w:word
W:non-word
n: new line
Search the string to see if it starts with “The” and ends with “Spain”:
txt = “The rain in Spain”
x = re.search(“^The.*Spain$”, txt)
Print a list of all matches with “ai”
txt = “The rain in Spain”
x = re.findall(“ai”, txt)
print(x)
Search for the first white-space character in the string:
txt = “The rain in Spain”
x = re.search(“\s”, txt)
print(“The first white-space character is located in position:”, x.start())
returns: The first white-space character is located in position: 3
Split at each white-space character:
txt = “The rain in Spain”
x = re.split(“\s”, txt)
print(x)
returns: [‘The’, ‘rain’, ‘in’, ‘Spain’]
Split the string only at the first occurrence:
txt = “The rain in Spain”
x = re.split(“\s”, txt, 1)
print(x)
returns: [‘The’, ‘rain in Spain’]
Replace every white-space character with the number 9:
txt = “The rain in Spain”
x = re.sub(“\s”, “9”, txt)
print(x)
returns: The9rain9in9Spain
RegEx Functions (4)
findall, search, split, sub
findall
Returns a list containing all matches
search
Returns a Match object if there is a match anywhere in the string
split
Returns a list where the string has been split at each match
sub
Replaces one or many matches with a string
Extract the substring from the 12th to the 30th character from the variable movie which corresponds to the movie title. Store it in the variable movie_title.
Get the palindrome by reversing the string contained in movie_title.
Complete the code to print out the movie_title if it is a palindrome.
movie_title = movie[11:30]
– Obtain the palindrome
palindrome = movie_title[::-1]
– Print the word if it’s a palindrome
if movie_title == palindrome:
print(movie_title)
Convert the string in the variable movie to lowercase. Print the result.
movie_lower = movie.lower()
Remove the $ that occur at the start and at the end of the string contained in movie_lower. Print the results.
movie_no_sign = movie_lower.strip(“$”)
Split the string contained in movie_no_sign into as many substrings as possible. Print the results.
movie_split = movie_no_sign.split()
To get the root of the second word contained in movie_split, select all the characters except the last one.
word_root = movie_split[1][:-1]
Remove tag <\i> from the end of the string. Print the results.
movie_tag = movie.strip()
Split the string contained in movie_tag using the commas as a separating element. Print the results.
movie_no_comma = movie_tag.split(“,”)
Join back together the list of substring contained in movie_no_comma using a space as a join element. Print the results.
movie_join = “ “.join(movie_no_comma)
Split the string file into many substrings at line boundaries.
Print out the resulting variable file_split.
Complete the for-loop to split the strings into many substrings using commas as a separator element.
– Split string at line boundaries
file_split = file.splitlines()
– Print file_split
print(file_split)
– Complete for-loop to split by commas
for substring in file_split:
substring_split = substring.split(“,”)
print(substring_split)
Find if the substring actor occurs between the characters with index 37 and 41 inclusive. If it is not detected, print the statement Word not found.
Replace actor actor with the substring actor if actor occurs only two repeated times.
Replace actor actor actor with the substring actor if actor appears three repeated times.
for movie in movies:
if movie.find("actor", 37, 42) == -1: print("Word not found") elif movie.count("actor") == 2: print(movie.replace("actor actor", "actor")) else: print(movie.replace("actor actor actor", "actor"))