Pandas Flashcards

1
Q

Filling NA’s in one column with another

fullDf[‘forecast_date’] has NA fill with fullDf.day

A

fullDf[‘forecast_date’].fillna(fullDf.day, inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

calls_final.calls_tw has NaN values. replace them with 5

A

calls_final.calls_tw=calls_final.calls_tw.map(lambda x: x if np.isfinite(x) else 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Add a series of integers to df_test1[‘forecast_date’]

A
df_test1['add']=pd.to_timedelta(df_test1['add'], unit='D')
# convert integers to days this way
df_test1['forecast_date']=df_test1['forweek']+df_test1['add']
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Calculate day of week from a date python

df[‘date’] is a series of dates

A
df['date'].dt.weekday_name
#df['date'].dt.dayofweek will give numbers Monday :0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sort dataframe ‘df’ by index and save

A

df.sort_index(inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Get today’s date and

Convert into string datetime object into ‘2017-07-17’ format

A
Today = datetime.now()
Today = Today.strftime("%Y-%m-%d")
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

summary for a dataframe df

A

train. info() #will give a summary for the entire data

train. describe() ##will give a summary for continuous variables in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

filter df for having only values 3 and 6,9 in column A

A

df[df[‘A’].isin([3, 6])]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

drop rows which have indices 1 and 3 in df

A

df.drop(df.index[[1,3]])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Convert series df[‘date’] to datetime objects df

A

df[‘date’]=df[‘date’].map(lambda x: pd.to_datetime(x,dayfirst=True))
df.date=pd.to_datetime(df.date,dayfirst=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Reset index of dataframe df

A

df = df.reset_index()

del df[‘index’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Check if col1 and col2 in df are equal

A

df[‘col1’].equals(df[‘col2’])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Convert series df.col1 to a list

A

list1=df.col1.tolist()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Dropping rows by null values in a column df.col1

A

df = df[np.isfinite(df.col1)]
df=df[pd.notnull(df.col1)]

df=df[pd.isnull(df.col1)] ##keeping only those rows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Filtering a dataframe df by column Gender

A

df[df[Gender]==’Male’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Filtering a dataframe df by two columns Gender and Year

A
df[(df[Gender]=='Male') & (df[Year]==2014)]
#Dont forget th round brackets
17
Q

Delete col1 from datafrmae df

A

del df[‘col1’]

df.drop(df.columns[[0, 1, 3]], axis=1) # delete columns by number

18
Q

Check datatype of col1 or of whole dataframe

A

df. dtypes

df. col1.dtype

19
Q

sort dataframe df according to col1 values

then according to col1 and col2 values

A
#Sorting the dataframe based on values of one column 
df.sort_values(by='col1',ascending=True)
#based on two columns
df.sort_values(['col1', 'col2'], ascending=[True, False])
20
Q

Print variable and string together

String : hello Variable: Name

A

print “I have %s” % Name

21
Q

correlation between df.col1 and df.col2

A

np.corrcoef(df.col1,df.col2)[0,1]

22
Q

Rename col1 of df to col2

A

df=df.rename(columns = {‘col1’:’col2’})

23
Q

groupby by categorical col3 and aggregate mean

A

gb =df.groupby(df.col3)
gb.agg(‘mean’)

gb.agg({‘col1’: ‘sum’,’col2’:’mean’})

24
Q

mapping df.country to create df.capital

A

map1={‘India’:’Delhi’,’Canada’:’Ottawa’}

df[‘capital’] = df[‘country’].map(map1)

25
Q

drop duplicate rows from a df

A

df.drop_duplicates() #just drops duplicate rows #just keep the first one

26
Q

drop duplicates from a df for a column col1

A

df.drop_duplicates([‘col1’]) #drops duplicates by a single column #just keep the first one

27
Q

Drop duplicates from a df for a column col1.

Keeping the last one

A

df.drop_duplicates([‘col1’],keep=’last’) #take the last value of duplicate

28
Q

pivot dataframe

A

df_piv = df.pivot(index=’date’,columns=’variable’,values=’value’)

29
Q

all type of merges?

A

a1=pd.merge(dframe1,dframe2) #default merge inner join on some column
a=pd.merge(dframe1,dframe2,on=’key’) #inner join
b=pd.merge(dframe1,dframe2,on=’key’,how=’left’) #left join
c=pd.merge(dframe1,dframe2,on=’key’,how=’outer’) #outer join
d=pd.merge(df_left, df_right, on=[‘key1’, ‘key2’], how=’outer’) #on multiple keys
e=pd.merge(left, right, left_on=’key1’, right_on=’key2’)

30
Q

read csv and write csv syntax?

A
a=pd.read_csv('lec25.csv')
b=pd.read_table('lec25.csv',sep=',')
c=pd.read_csv('lec25.csv',header=None)
d=pd.read_csv('lec25.csv',header=None,nrows=2)
dframe1.to_csv('mytextdata_out.csv')
31
Q

concatenate df1 and df2

A

pd.concat([df1,df2])

32
Q

create a dataframe df

A

from numpy.random import randn
df1=DataFrame(randn(25).reshape((5,5)),columns=list(‘abcde’),index=list(‘12345’))
dframe2 = DataFrame({‘key’:[‘Q’,’Y’,’Z’],’data_set_2’:[1,2,3]})

33
Q

Pivot df syntax?

A

long to wide is pivot.
Pivot takes 3 arguments with the following names: index, columns(cat) , and values(num)

entries inside the column(cat) will be used to create new columns

index will have distinct values
values will go inside the table

p = d.pivot(index=’Item’, columns=’CType’)

If you omit values all numerical columns in the datframe will be used. multi index will be created

34
Q

Unpivot

A

wide to long is unpivot/melt
df = pd.DataFrame({‘A’: {0: ‘a’, 1: ‘b’, 2: ‘c’}, ‘B’: {0: 1, 1: 3, 2: 5}, ‘C’: {0: 2, 1: 4, 2: 6}})

pd.melt(df, id_vars=[‘A’], value_vars=[‘B’,’C’],
var_name=”Person”, value_name=”Score”)

All variables not included in this list will become rows in a new column (which has the name given by “var_name”) if you do not specify value_vars.