Pandas Strings Flashcards
names.str.capitalize()
a single method that will capitalize all the entries, while skipping over any missing values
Using tab completion on this str attribute
will list all the vectorized string methods available to Pandas.
len() lower() translate() islower()
ljust() upper() startswith() isupper()
Methods similar to Python string methods
rjust() find() endswith() isnumeric()
center() rfind() isalnum() isdecimal()
Methods similar to Python string methods
zfill() index() isalpha() split()
strip() rindex() isdigit() rsplit()
Methods similar to Python string methods
rstrip() capitalize() isspace() partition()
lstrip() swapcase() istitle() rpartition()
Methods similar to Python string methods
match()
Call re.match() on each element, returning a boolean.
extract()
Call re.match() on each element, returning matched groups as strings.
findall()
Call re.findall() on each element
replace()
Replace occurrences of pattern with some other string
contains()
Call re.search() on each element, returning a boolean
count()
Count occurrences of pattern
split()
Count occurrences of pattern
rsplit()
Equivalent to str.rsplit(), but accepts regexps
get()
Index each element
slice()
Slice each element
slice_replace()
Replace slice in each element with passed value
cat()
Concatenate strings
repeat()
Repeat values
normalize()
Return Unicode form of string
pad()
Add whitespace to left, right, or both sides of strings
wrap()
Split long strings into lines with length less than a given width
join()
Join strings in each element of the Series with passed separator
get_dummies()
extract dummy variables as a dataframe
df.str.slice(0, 3) is equivalent to
df.str[0:3]
monte.str.split().str.get(-1)
to extract the last name of each entry, we can combine split() and get():
recipes.ingredients.str.len().describe()
sample info getting
recipes.name[np.argmax(recipes.ingredients.str.len())]
sample info getting
which recipe has the longest ingredient list
recipes.description.str.contains(‘[Bb]reakfast’).sum()
how many of the recipes are for breakfast food
recipes.ingredients.str.contains(‘[Cc]innamon’).sum()
cinnamon as an ingredient
spice_list = [‘salt’, ‘pepper’, ‘oregano’, ‘sage’, ‘parsley’,
‘rosemary’, ‘tarragon’, ‘thyme’, ‘paprika’, ‘cumin’]
import re
spice_df = pd.DataFrame(dict((spice, recipes.ingredients.str.contains(spice, re.IGNORECASE))
for spice in spice_list))
spice_df.head()
search to see whether they are in each recipe’s ingredient list
return boolean data frame