pandas Flashcards

1
Q

Rename columns

A

df.rename(columns = {“oldName1”: “newName1”, “oldName2”:”newName2”})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

groupby

A

df.groupby([‘col1’, ‘col2’]).agg({‘col3’:’how’})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

replace NaN with zeros

A

df.fillna(0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

create a new column for year-month

A

df[‘YM’] = df[‘Year’].astype(str) + ‘-‘ + df[‘Month’].astype(str)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

filter by multiple values in one column, and one specific value in another. (eg, 2015 + 2016 data for Germany)

A

df[df.col1.isin(list) & (df.col2 == ‘text’)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

turn df into np_matrix

A

df.as_matrix()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

use index as a list

A

df.index.tolist()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

create a df

A

pd.DataFrame({‘colName1’:list1, ‘colName2’:list2})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

combine two lists into one df

A

pd.concat([list1, list2], axis = 1,keys= [‘colName1’, ‘colName2’])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

rename all columns to a certain list

A

df.columns = list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

get a total column for cross_tab

A

df[‘Total’] = df.sum(axis=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

get a total row for cross_tab

A

df.append(df.sum().rename(‘Total’))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

print names of columns

A

print(df.columns.values)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

count how many rows for each value in col1 (without using .agg)

A
df = df.groupby('col1').size()
df = df.reset_index(name='count')
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

create a new column for running total

A

df[‘cumsum’] = df[‘col_to_sum’].cumsum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

melt df (to prepare for side by side scatter plot) so that col1 is unique (goes on x-axis), col2 is the second level (different hue in plot), col3 is quantity

A

plot_data = pd.melt(df, id_vars=”col1”, var_name=”col2”, value_name=”col3”)

17
Q

mean/median/mode of col1

A

df.col1.mean() / df.col1.median() / df.col1.mode()