Regular Expressions in Python Flashcards
r’st\d\s\w\n{3,10}’
r: raw
st: string
d: digit
s: white space
S:non-white space
w:word
W:non-word
n: new line
Search the string to see if it starts with “The” and ends with “Spain”:
txt = “The rain in Spain”
x = re.search(“^The.*Spain$”, txt)
Print a list of all matches with “ai”
txt = “The rain in Spain”
x = re.findall(“ai”, txt)
print(x)
Search for the first white-space character in the string:
txt = “The rain in Spain”
x = re.search(“\s”, txt)
print(“The first white-space character is located in position:”, x.start())
returns: The first white-space character is located in position: 3
Split at each white-space character:
txt = “The rain in Spain”
x = re.split(“\s”, txt)
print(x)
returns: [‘The’, ‘rain’, ‘in’, ‘Spain’]
Split the string only at the first occurrence:
txt = “The rain in Spain”
x = re.split(“\s”, txt, 1)
print(x)
returns: [‘The’, ‘rain in Spain’]
Replace every white-space character with the number 9:
txt = “The rain in Spain”
x = re.sub(“\s”, “9”, txt)
print(x)
returns: The9rain9in9Spain
RegEx Functions (4)
findall, search, split, sub
findall
Returns a list containing all matches
search
Returns a Match object if there is a match anywhere in the string
split
Returns a list where the string has been split at each match
sub
Replaces one or many matches with a string
Extract the substring from the 12th to the 30th character from the variable movie which corresponds to the movie title. Store it in the variable movie_title.
Get the palindrome by reversing the string contained in movie_title.
Complete the code to print out the movie_title if it is a palindrome.
movie_title = movie[11:30]
– Obtain the palindrome
palindrome = movie_title[::-1]
– Print the word if it’s a palindrome
if movie_title == palindrome:
print(movie_title)
Convert the string in the variable movie to lowercase. Print the result.
movie_lower = movie.lower()
Remove the $ that occur at the start and at the end of the string contained in movie_lower. Print the results.
movie_no_sign = movie_lower.strip(“$”)
Split the string contained in movie_no_sign into as many substrings as possible. Print the results.
movie_split = movie_no_sign.split()
To get the root of the second word contained in movie_split, select all the characters except the last one.
word_root = movie_split[1][:-1]
Remove tag <\i> from the end of the string. Print the results.
movie_tag = movie.strip()
Split the string contained in movie_tag using the commas as a separating element. Print the results.
movie_no_comma = movie_tag.split(“,”)
Join back together the list of substring contained in movie_no_comma using a space as a join element. Print the results.
movie_join = “ “.join(movie_no_comma)
Split the string file into many substrings at line boundaries.
Print out the resulting variable file_split.
Complete the for-loop to split the strings into many substrings using commas as a separator element.
– Split string at line boundaries
file_split = file.splitlines()
– Print file_split
print(file_split)
– Complete for-loop to split by commas
for substring in file_split:
substring_split = substring.split(“,”)
print(substring_split)
Find if the substring actor occurs between the characters with index 37 and 41 inclusive. If it is not detected, print the statement Word not found.
Replace actor actor with the substring actor if actor occurs only two repeated times.
Replace actor actor actor with the substring actor if actor appears three repeated times.
for movie in movies:
if movie.find("actor", 37, 42) == -1: print("Word not found") elif movie.count("actor") == 2: print(movie.replace("actor actor", "actor")) else: print(movie.replace("actor actor actor", "actor"))
Find the index where money occurs between characters with index 12 and 50. If not found, the method should return -1.
for movie in movies:
print(movie.find(“money”, 12, 51))
Find the index where money occurs between characters with index 12 and 50. If not found, it should raise an error.
for movie in movies:
try:
print(movie.index(
“
money
”
, 12, 51))
except ValueError:
print(“substring not found”)
my_string1 = “Awesome day”my_string2 = “for biking”
write concatenation to return:
Awesome day for biking
print(my_string1+” “+my_string2)
my_string = “Awesome day”
return:
Awe
print(my_string[0:3])
my_string = “Awesome day”
return: aweso
return:me day
print(my_string[:5])
print(my_string[5:])
my_string = “Awesome day”
return: yad emosewA
print(my_string[::-1])
Select the first 32 characters of movie1
first_part = movie1[:32]
Select from 43rd character to the end of movie1
last_part = movie1[42:]
Select from 33rd to the 42nd character of movie2
middle_part = movie2[32:42]
Find out how many characters the variable movie has.
length_string = len(movie)
Convert the numeric variable length_string to a string representation.
Then, Concatenate the predefined variable statement and the variable to_string adding a space between them. Print out the result.
to_string = str(length_string)
– Predefined variable
statement = “Number of characters in this review:”
– Concatenate strings and print result
print(statement+” “+ to_string)
Select the first 32 characters of the variable movie1 and assign it to the variable first_part.
first_part = movie1[:32]
Select the substring going from the 43rd character to the end of movie1. Assign it to the variable last_part.
last_part = movie1[42:]
Select the substring going from the 33rd to the 42nd character of movie2. Assign it to the variable middle_part.
middle_part = movie2[32:42]
Print the concatenation of the variables first_part, middle_part and last_part in that order.
print(first_part+middle_part+last_part)
Convert the string in the variable movie to lowercase. Print the result.
movie_lower = movie.lower()
print(movie_lower)
find all matches of a pattern
re.findall(r”regex”,string)
Remove the $ that occur at the start and at the end of the string contained in movie_lower. Print the results.
movie_no_sign = movie_lower.strip(“$”)
print(movie_no_sign)
Split the string contained in movie_no_sign into as many substrings as possible. Print the results.
movie_split = movie_no_sign.split()
print(movie_split)
To get the root of the second word contained in movie_split, select all the characters except the last one.
word_root = movie_split[1][:-1]
print(word_root)
Remove tag <\i> from the end of the string, movie. Print the results.
movie_tag = movie.rstrip(“<\i>”)
what’s rstrip()?
remove trailing characters
Join back together the list of substring contained in movie_no_comma using a space as a join element. Print the results.
movie_join = “ “.join(movie_no_comma)
print(movie_join)
Find if the substring actor occurs between the characters with index 37 and 41 inclusive. If it is not detected, print the statement Word not found.
for movie in movies:
if movie.find("actor", 37, 42) == -1: print("Word not found")
add elif to for statement to replace actor actor with the substring actor if actor occurs only two repeated times.
for movie in movies:
if movie.find(“actor”, 37, 42) == -1:
print(“Word not found”)
elif movie.count(“actor”) == 2:
print(movie.replace(“actor actor”, “actor”))
Find the index where money occurs between characters with index 12 and 50. If not found, the method should return -1.
for movie in movies:
print(movie.find(“money”, 12, 51))
Complete a for-loop to split the strings into many substrings using commas as a separator element.
for substring in file_split:
substring_split = substring.split(“,”)
Split the string, file ,into many substrings at line boundaries.
file_split = file.splitlines()
Import the re module.
Write a regex that matches the user mentions that starts with @ and follows the pattern, e.g. @robot3!.
Find all the matches of the pattern in the sentiment_analysis variable.
Import the re module
import re
Write the regex
regex = r”@robot\d\W”
Find all matches of regex
print(re.findall(regex, sentiment_analysis))
Write a regex that matches the number of user mentions given as, for example, User_mentions:9 in sentiment_analysis.
print(re.findall(r”User_mentions:\d”, sentiment_analysis))
Write a regex that matches the number of retweets given as, for example, number of retweets: 4 in sentiment_analysis.
print(re.findall(r”number\sof\sretweets:\s\d”, sentiment_analysis))