Pandas Flashcards by Pallav Anand

Filling NA’s in one column with another

fullDf[‘forecast_date’] has NA fill with fullDf.day

fullDf[‘forecast_date’].fillna(fullDf.day, inplace=True)

How well did you know this?

Not at all

Perfectly

calls_final.calls_tw has NaN values. replace them with 5

calls_final.calls_tw=calls_final.calls_tw.map(lambda x: x if np.isfinite(x) else 0)

How well did you know this?

Not at all

Perfectly

Add a series of integers to df_test1[‘forecast_date’]

df_test1['add']=pd.to_timedelta(df_test1['add'], unit='D')
# convert integers to days this way
df_test1['forecast_date']=df_test1['forweek']+df_test1['add']

How well did you know this?

Not at all

Perfectly

Calculate day of week from a date python

df[‘date’] is a series of dates

df['date'].dt.weekday_name
#df['date'].dt.dayofweek will give numbers Monday :0

How well did you know this?

Not at all

Perfectly

Sort dataframe ‘df’ by index and save

df.sort_index(inplace=True)

How well did you know this?

Not at all

Perfectly

Get today’s date and

Convert into string datetime object into ‘2017-07-17’ format

Today = datetime.now()
Today = Today.strftime("%Y-%m-%d")

How well did you know this?

Not at all

Perfectly

summary for a dataframe df

train. info() #will give a summary for the entire data

train. describe() ##will give a summary for continuous variables in the data

How well did you know this?

Not at all

Perfectly

filter df for having only values 3 and 6,9 in column A

df[df[‘A’].isin([3, 6])]

How well did you know this?

Not at all

Perfectly

drop rows which have indices 1 and 3 in df

df.drop(df.index[[1,3]])

How well did you know this?

Not at all

Perfectly

Convert series df[‘date’] to datetime objects df

df[‘date’]=df[‘date’].map(lambda x: pd.to_datetime(x,dayfirst=True))
df.date=pd.to_datetime(df.date,dayfirst=True)

How well did you know this?

Not at all

Perfectly

Reset index of dataframe df

df = df.reset_index()

del df[‘index’]

How well did you know this?

Not at all

Perfectly

Check if col1 and col2 in df are equal

df[‘col1’].equals(df[‘col2’])

How well did you know this?

Not at all

Perfectly

Convert series df.col1 to a list

list1=df.col1.tolist()

How well did you know this?

Not at all

Perfectly

Dropping rows by null values in a column df.col1

df = df[np.isfinite(df.col1)]
df=df[pd.notnull(df.col1)]

df=df[pd.isnull(df.col1)] ##keeping only those rows

How well did you know this?

Not at all

Perfectly

Filtering a dataframe df by column Gender

df[df[Gender]==’Male’]

How well did you know this?

Not at all

Perfectly

Filtering a dataframe df by two columns Gender and Year

Study These Flashcards

df[(df[Gender]=='Male') &amp; (df[Year]==2014)]
#Dont forget th round brackets

Delete col1 from datafrmae df

Study These Flashcards

del df[‘col1’]

df.drop(df.columns[[0, 1, 3]], axis=1) # delete columns by number

Check datatype of col1 or of whole dataframe

Study These Flashcards

df. dtypes

df. col1.dtype

sort dataframe df according to col1 values

then according to col1 and col2 values

Study These Flashcards

#Sorting the dataframe based on values of one column 
df.sort_values(by='col1',ascending=True)

#based on two columns
df.sort_values(['col1', 'col2'], ascending=[True, False])

Print variable and string together

String : hello Variable: Name

Study These Flashcards

print “I have %s” % Name

correlation between df.col1 and df.col2

Study These Flashcards

np.corrcoef(df.col1,df.col2)[0,1]

Rename col1 of df to col2

Study These Flashcards

df=df.rename(columns = {‘col1’:’col2’})

groupby by categorical col3 and aggregate mean

Study These Flashcards

gb =df.groupby(df.col3)
gb.agg(‘mean’)

gb.agg({‘col1’: ‘sum’,’col2’:’mean’})

mapping df.country to create df.capital

Study These Flashcards

map1={‘India’:’Delhi’,’Canada’:’Ottawa’}

df[‘capital’] = df[‘country’].map(map1)

drop duplicate rows from a df

df.drop_duplicates() #just drops duplicate rows #just keep the first one

drop duplicates from a df for a column col1

df.drop_duplicates(['col1']) #drops duplicates by a single column #just keep the first one

Drop duplicates from a df for a column col1. | Keeping the last one

df.drop_duplicates(['col1'],keep='last') #take the last value of duplicate

pivot dataframe

df_piv = df.pivot(index='date',columns='variable',values='value')

all type of merges?

a1=pd.merge(dframe1,dframe2) #default merge inner join on some column a=pd.merge(dframe1,dframe2,on='key') #inner join b=pd.merge(dframe1,dframe2,on='key',how='left') #left join c=pd.merge(dframe1,dframe2,on='key',how='outer') #outer join d=pd.merge(df_left, df_right, on=['key1', 'key2'], how='outer') #on multiple keys e=pd.merge(left, right, left_on='key1', right_on='key2')

read csv and write csv syntax?

``` a=pd.read_csv('lec25.csv') b=pd.read_table('lec25.csv',sep=',') c=pd.read_csv('lec25.csv',header=None) d=pd.read_csv('lec25.csv',header=None,nrows=2) dframe1.to_csv('mytextdata_out.csv') ```

concatenate df1 and df2

pd.concat([df1,df2])

create a dataframe df

from numpy.random import randn df1=DataFrame(randn(25).reshape((5,5)),columns=list('abcde'),index=list('12345')) dframe2 = DataFrame({'key':['Q','Y','Z'],'data_set_2':[1,2,3]})

Pivot df syntax?

long to wide is pivot. Pivot takes 3 arguments with the following names: index, columns(cat) , and values(num) entries inside the column(cat) will be used to create new columns index will have distinct values values will go inside the table p = d.pivot(index='Item', columns='CType') If you omit values all numerical columns in the datframe will be used. multi index will be created

Unpivot

wide to long is unpivot/melt df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'}, 'B': {0: 1, 1: 3, 2: 5}, 'C': {0: 2, 1: 4, 2: 6}}) pd.melt(df, id_vars=['A'], value_vars=['B','C'], var_name="Person", value_name="Score") ##set the variables (columns) that we want to leave unaffected as id_vars. All variables not included in this list will become rows in a new column (which has the name given by “var_name”) if you do not specify value_vars.

Pandas Flashcards

(34 cards)