pandas Flashcards
Rename columns
df.rename(columns = {“oldName1”: “newName1”, “oldName2”:”newName2”})
groupby
df.groupby([‘col1’, ‘col2’]).agg({‘col3’:’how’})
replace NaN with zeros
df.fillna(0)
create a new column for year-month
df[‘YM’] = df[‘Year’].astype(str) + ‘-‘ + df[‘Month’].astype(str)
filter by multiple values in one column, and one specific value in another. (eg, 2015 + 2016 data for Germany)
df[df.col1.isin(list) & (df.col2 == ‘text’)]
turn df into np_matrix
df.as_matrix()
use index as a list
df.index.tolist()
create a df
pd.DataFrame({‘colName1’:list1, ‘colName2’:list2})
combine two lists into one df
pd.concat([list1, list2], axis = 1,keys= [‘colName1’, ‘colName2’])
rename all columns to a certain list
df.columns = list
get a total column for cross_tab
df[‘Total’] = df.sum(axis=1)
get a total row for cross_tab
df.append(df.sum().rename(‘Total’))
print names of columns
print(df.columns.values)
count how many rows for each value in col1 (without using .agg)
df = df.groupby('col1').size() df = df.reset_index(name='count')
create a new column for running total
df[‘cumsum’] = df[‘col_to_sum’].cumsum()