Pandas Flashcards
Filling NA’s in one column with another
fullDf[‘forecast_date’] has NA fill with fullDf.day
fullDf[‘forecast_date’].fillna(fullDf.day, inplace=True)
calls_final.calls_tw has NaN values. replace them with 5
calls_final.calls_tw=calls_final.calls_tw.map(lambda x: x if np.isfinite(x) else 0)
Add a series of integers to df_test1[‘forecast_date’]
df_test1['add']=pd.to_timedelta(df_test1['add'], unit='D') # convert integers to days this way df_test1['forecast_date']=df_test1['forweek']+df_test1['add']
Calculate day of week from a date python
df[‘date’] is a series of dates
df['date'].dt.weekday_name #df['date'].dt.dayofweek will give numbers Monday :0
Sort dataframe ‘df’ by index and save
df.sort_index(inplace=True)
Get today’s date and
Convert into string datetime object into ‘2017-07-17’ format
Today = datetime.now() Today = Today.strftime("%Y-%m-%d")
summary for a dataframe df
train. info() #will give a summary for the entire data
train. describe() ##will give a summary for continuous variables in the data
filter df for having only values 3 and 6,9 in column A
df[df[‘A’].isin([3, 6])]
drop rows which have indices 1 and 3 in df
df.drop(df.index[[1,3]])
Convert series df[‘date’] to datetime objects df
df[‘date’]=df[‘date’].map(lambda x: pd.to_datetime(x,dayfirst=True))
df.date=pd.to_datetime(df.date,dayfirst=True)
Reset index of dataframe df
df = df.reset_index()
del df[‘index’]
Check if col1 and col2 in df are equal
df[‘col1’].equals(df[‘col2’])
Convert series df.col1 to a list
list1=df.col1.tolist()
Dropping rows by null values in a column df.col1
df = df[np.isfinite(df.col1)]
df=df[pd.notnull(df.col1)]
df=df[pd.isnull(df.col1)] ##keeping only those rows
Filtering a dataframe df by column Gender
df[df[Gender]==’Male’]
Filtering a dataframe df by two columns Gender and Year
df[(df[Gender]=='Male') & (df[Year]==2014)] #Dont forget th round brackets
Delete col1 from datafrmae df
del df[‘col1’]
df.drop(df.columns[[0, 1, 3]], axis=1) # delete columns by number
Check datatype of col1 or of whole dataframe
df. dtypes
df. col1.dtype
sort dataframe df according to col1 values
then according to col1 and col2 values
#Sorting the dataframe based on values of one column df.sort_values(by='col1',ascending=True)
#based on two columns df.sort_values(['col1', 'col2'], ascending=[True, False])
Print variable and string together
String : hello Variable: Name
print “I have %s” % Name
correlation between df.col1 and df.col2
np.corrcoef(df.col1,df.col2)[0,1]
Rename col1 of df to col2
df=df.rename(columns = {‘col1’:’col2’})
groupby by categorical col3 and aggregate mean
gb =df.groupby(df.col3)
gb.agg(‘mean’)
gb.agg({‘col1’: ‘sum’,’col2’:’mean’})
mapping df.country to create df.capital
map1={‘India’:’Delhi’,’Canada’:’Ottawa’}
df[‘capital’] = df[‘country’].map(map1)
drop duplicate rows from a df
df.drop_duplicates() #just drops duplicate rows #just keep the first one
drop duplicates from a df for a column col1
df.drop_duplicates([‘col1’]) #drops duplicates by a single column #just keep the first one
Drop duplicates from a df for a column col1.
Keeping the last one
df.drop_duplicates([‘col1’],keep=’last’) #take the last value of duplicate
pivot dataframe
df_piv = df.pivot(index=’date’,columns=’variable’,values=’value’)
all type of merges?
a1=pd.merge(dframe1,dframe2) #default merge inner join on some column
a=pd.merge(dframe1,dframe2,on=’key’) #inner join
b=pd.merge(dframe1,dframe2,on=’key’,how=’left’) #left join
c=pd.merge(dframe1,dframe2,on=’key’,how=’outer’) #outer join
d=pd.merge(df_left, df_right, on=[‘key1’, ‘key2’], how=’outer’) #on multiple keys
e=pd.merge(left, right, left_on=’key1’, right_on=’key2’)
read csv and write csv syntax?
a=pd.read_csv('lec25.csv') b=pd.read_table('lec25.csv',sep=',') c=pd.read_csv('lec25.csv',header=None) d=pd.read_csv('lec25.csv',header=None,nrows=2) dframe1.to_csv('mytextdata_out.csv')
concatenate df1 and df2
pd.concat([df1,df2])
create a dataframe df
from numpy.random import randn
df1=DataFrame(randn(25).reshape((5,5)),columns=list(‘abcde’),index=list(‘12345’))
dframe2 = DataFrame({‘key’:[‘Q’,’Y’,’Z’],’data_set_2’:[1,2,3]})
Pivot df syntax?
long to wide is pivot.
Pivot takes 3 arguments with the following names: index, columns(cat) , and values(num)
entries inside the column(cat) will be used to create new columns
index will have distinct values
values will go inside the table
p = d.pivot(index=’Item’, columns=’CType’)
If you omit values all numerical columns in the datframe will be used. multi index will be created
Unpivot
wide to long is unpivot/melt
df = pd.DataFrame({‘A’: {0: ‘a’, 1: ‘b’, 2: ‘c’}, ‘B’: {0: 1, 1: 3, 2: 5}, ‘C’: {0: 2, 1: 4, 2: 6}})
pd.melt(df, id_vars=[‘A’], value_vars=[‘B’,’C’],
var_name=”Person”, value_name=”Score”)
All variables not included in this list will become rows in a new column (which has the name given by “var_name”) if you do not specify value_vars.