3rd Flashcards
move a column to become the index
df.set_index(‘c’)
Reset index
df.reset_index()
Remove index col
df.reset_index(drop = True)
Multiple index columns
df.set_index([‘c’, ‘c1’])
Select only year from date
df[‘c’].dt.year
Select only month from date
df[‘c’].dt.month
Detect missing values
df.isna()
name breed 0 false false 1 false True 2 false True 3 false False
Does df columns have NA
df.isna().any()
name false
breed True
color True
Count # of na in a column
df.isna().sum()
name 0
breed 2
color 3
Remove all NAs from a df
df.dropna()
Replace NAs with a missing value
df.fillna(0)
List of dictionaries
list_of_dicts = [
{‘name’: ‘Ginger’, ‘Breed : ‘Lab’, ‘kg’:22},
{‘name’: ‘qwuire’, ‘Breed : ‘good’, ‘kg’:12}
]
name breed kg 0 Ginger Lab 22 1 qwuire good 12
Dictionary of lists
dict_of_lists = {
‘name’ : [‘Ginger’, ‘qwuire’],
‘breed’ : [‘Lab’, ‘good’],
‘weight’ : [22, 12]
}
name breed kg 0 Ginger Lab 22 1 qwuire good 12
Write csv
pd.to_csv(‘file/path’)
Inner join
Only return rows where the values match in both tables
new_df = df1.merge(df2, on = ‘col’)
Add suffixes to joins
suffixes = (‘_x’, ‘_y’)
new_df = df1.merge(df2, on = ‘col’, suffixes = (‘_x’, ‘_y’))
Merge on multiple columns
new_df = df1.merge(df2, on = [‘col’, ‘col1’])
Merge multiple tables
new_df = df1.merge(df2, on = ‘col’) \ .merge(df3, on = ‘col1’)
Left join
returns all rows of data from the left table and only the rows from the right table that match
new_df = df1.merge(df2, on = ‘col’, how = ‘left’)
Merge on same column with two different names
new_df = df1.merge(df2, left_on = ‘df1_col’, right_on = ‘df2_col’ )
Outer join
Join both columns regardless if there is a match between the two tables
new_df = df1.merge(df2, on = ‘col’, how = ‘outer’)
Semi join
only columns from the left table, but with no duplicates. The first and then complete and moves on.
Confirm your the filtered joins
new_df = df1.merge(df2, on = ‘col’)
obs = df1[df1[‘col’].isin(new_df[‘col’])]
print(obs)
Anti Join
Only return columns from the left table and not the right table
Step 1
new_df = df1.merge(df2, on = ‘col’, how = ‘left’, indicator = True)
Step 2
df_bool = df_merge.loc[df_merge[‘_merge’] == ‘left_only’, ‘col’]
Step 3
df_left = df1[df1[‘col’].isin(df_bool)]
Concatenate
pd.concat([df, df1, df2], axis = 0)
Concatenate and ignore index
pd.concat([df, df1, df2], axis = 0, ignore_index = True)
Concatenate and set labels in index of original tables
pd.concat([df, df1, df2], axis = 0, ignore_index = False, keys = [‘1’, ‘2’, ‘3’])
name cid 1 0 jj 234 1 ss 11 2 0 1 3 0 1
Merge Ordered
Helpful for ordered or time series data as the results are sorted
df_merge = pd.merge_ordered(df1, df2, on = ‘col’, suffixes = (‘_df1’, ‘_df2’))
Merge ordered forward fill
fill missing with previous value
df_merge = pd.merge_ordered(df1, df2, on = ‘col’, suffixes = (‘_df1’, ‘_df2’), fill_method = ‘ffill’)
Merge as of
Similar to merge ordered
merges to the nearest key column and not exact matches
The merged on columns must be sorted
df_merge = pd.per_asof(df1, df2, on = ‘col’, suffixes = (‘_df1’, ‘_df2’))