3rd Flashcards by joshua leslie

move a column to become the index

df.set_index(‘c’)

How well did you know this?

Not at all

Perfectly

Reset index

df.reset_index()

How well did you know this?

Not at all

Perfectly

Remove index col

df.reset_index(drop = True)

How well did you know this?

Not at all

Perfectly

Multiple index columns

df.set_index([‘c’, ‘c1’])

How well did you know this?

Not at all

Perfectly

Select only year from date

df[‘c’].dt.year

How well did you know this?

Not at all

Perfectly

Select only month from date

df[‘c’].dt.month

How well did you know this?

Not at all

Perfectly

Detect missing values

df.isna()

           name           breed
0         false             false
1          false             True
2         false             True
3         false              False

How well did you know this?

Not at all

Perfectly

Does df columns have NA

df.isna().any()

name false
breed True
color True

How well did you know this?

Not at all

Perfectly

Count # of na in a column

df.isna().sum()

name 0
breed 2
color 3

How well did you know this?

Not at all

Perfectly

Remove all NAs from a df

df.dropna()

How well did you know this?

Not at all

Perfectly

Replace NAs with a missing value

df.fillna(0)

How well did you know this?

Not at all

Perfectly

List of dictionaries

list_of_dicts = [
{‘name’: ‘Ginger’, ‘Breed : ‘Lab’, ‘kg’:22},
{‘name’: ‘qwuire’, ‘Breed : ‘good’, ‘kg’:12}
]

              name               breed             kg 0               Ginger               Lab               22 1                qwuire               good             12

How well did you know this?

Not at all

Perfectly

Dictionary of lists

dict_of_lists = {
‘name’ : [‘Ginger’, ‘qwuire’],
‘breed’ : [‘Lab’, ‘good’],
‘weight’ : [22, 12]
}

              name               breed             kg 0               Ginger               Lab               22 1                qwuire               good             12

How well did you know this?

Not at all

Perfectly

Write csv

pd.to_csv(‘file/path’)

How well did you know this?

Not at all

Perfectly

Inner join

Only return rows where the values match in both tables

new_df = df1.merge(df2, on = ‘col’)

How well did you know this?

Not at all

Perfectly

Add suffixes to joins

Study These Flashcards

suffixes = (‘_x’, ‘_y’)

new_df = df1.merge(df2, on = ‘col’, suffixes = (‘_x’, ‘_y’))

Merge on multiple columns

Study These Flashcards

new_df = df1.merge(df2, on = [‘col’, ‘col1’])

Merge multiple tables

Study These Flashcards

new_df = df1.merge(df2, on = ‘col’) \ .merge(df3, on = ‘col1’)

Left join

Study These Flashcards

returns all rows of data from the left table and only the rows from the right table that match

new_df = df1.merge(df2, on = ‘col’, how = ‘left’)

Merge on same column with two different names

Study These Flashcards

new_df = df1.merge(df2, left_on = ‘df1_col’, right_on = ‘df2_col’ )

Outer join

Study These Flashcards

Join both columns regardless if there is a match between the two tables

new_df = df1.merge(df2, on = ‘col’, how = ‘outer’)

Semi join

Study These Flashcards

only columns from the left table, but with no duplicates. The first and then complete and moves on.

Confirm your the filtered joins

Study These Flashcards

new_df = df1.merge(df2, on = ‘col’)
obs = df1[df1[‘col’].isin(new_df[‘col’])]
print(obs)

Anti Join

Study These Flashcards

Only return columns from the left table and not the right table

Step 1
new_df = df1.merge(df2, on = ‘col’, how = ‘left’, indicator = True)

Step 2
df_bool = df_merge.loc[df_merge[‘_merge’] == ‘left_only’, ‘col’]

Step 3
df_left = df1[df1[‘col’].isin(df_bool)]

Concatenate

pd.concat([df, df1, df2], axis = 0)

Concatenate and ignore index

pd.concat([df, df1, df2], axis = 0, ignore_index = True)

Concatenate and set labels in index of original tables

pd.concat([df, df1, df2], axis = 0, ignore_index = False, keys = ['1', '2', '3']) ``` name cid 1 0 jj 234 1 ss 11 2 0 1 3 0 1 ```

Merge Ordered

Helpful for ordered or time series data as the results are sorted df_merge = pd.merge_ordered(df1, df2, on = 'col', suffixes = ('_df1', '_df2'))

Merge ordered forward fill

fill missing with previous value | df_merge = pd.merge_ordered(df1, df2, on = 'col', suffixes = ('_df1', '_df2'), fill_method = 'ffill')

Merge as of

Similar to merge ordered merges to the nearest key column and not exact matches The merged on columns must be sorted df_merge = pd.per_asof(df1, df2, on = 'col', suffixes = ('_df1', '_df2'))

3rd Flashcards

(30 cards)