RegEx and nltk Flashcards
Python Regular Expressions and Natural Language Toolkit
Import the Python Regular Expression Library
import re
What method is used in Python to find and remove characters?
.replace() method.
i.e. text_string.replace(‘.’, ‘ ‘)
What does the RegEx .join() function do?
It takes a string and inputs the characters of that string between the characters/objects of a list.
Syntax: string.join(list)
A “list” is really just a string that regex treats as a list.
i.e. list = “string”
“ “.join(list) = “s t r i n g”
What is the ‘r’ prefix and why is it important?
The ‘r’ prefix when defining a string (i.e. text=r’text’) turns a string into a “raw” string, which tells Python to ignore any escape characters like backslashes, which play an important role with Regular Expressions.
What RegEx method can find a pattern and replace it with a defined string?
re.sub(pattern, replace_string, string_var)
i.e. re.sub(r’[a-z], ‘ ‘, “Mike”) = “M “
How do you negate or use NOT with Regular Expressions
[^]
i.e. text = Test123
pattern = [^1-9]
result = ‘‘.join(re.sub(pattern, ‘ ‘, text)
print(result) #Outputs “ 123”
Format a regular expression to identify the hexidecimal codes in a string.
pattern = r’[^a-fA-F0-9]+’
‘‘.join(re.sub(pattern, ‘’, org_string)
What is the function and syntax for making a string lower case?
string.lower()
What is the function syntax in Pandas to apply the lower.() function to a whole column?
data[‘col_name’].str.lower()
Identify the Pandas format for removing extra whitespace.
data[‘col_name’].str.strip()