Pandas Flashcards

Question 1

Q

Creating a Series

Answer

A

pd.series(list)

pd.Series(data=my_list,index=labels)

Question 2

Q

Creating series along with the index

Answer

A

ser1 = pd.Series([1,2,3,4],index = [‘USA’, ‘Germany’,’USSR’, ‘Japan’])

ser1

USA 1
Germany 2
USSR 3
Japan 4

dtype: int64

Question 3

Q

fetch elements in series

Answer

A

varname[index]

ser1[‘USA’]

Question 4

Q

Create a DataFrame

Answer

A

pd.DataFrame(data, index= , columns = )

pd.DataFrame(randn(5,4),index=’A B C D E’.split(),columns=’W X Y Z’.split())

          W                X                     Y                    Z A     2.706850     0.628133     0.907969     0.503826 B     0.651118     -0.319318    -0.848077    0.605965 C     -2.018168     0.740122     0.528813    -0.589001 D     0.188695     -0.758872     -0.933237   0.955057 E     0.190794     1.978757       2.605967    0.683509

Question 5

Q

fetching column from a dataframe

Answer

A

dataframe[col_name] –> df[‘W’]

dataframe[[col1,col2 ]] –> df[[‘W’,’Z’]]

Question 6

Q

Creating a new column in a dataframe

Answer

A

df[‘new’] = df[‘W’] + df[‘Y’]

Question 7

Q

how to remove a column

Answer

A

df.drop(‘col_name’,axis=1,inplace=True)

Question 8

Q

how to remove a rows

Answer

A

df.drop(‘index’, axis=0)

Question 9

Q

Selecting Rows from a DataFrame

Answer

A

df.loc[“index”]

df.loc[“label”]

Question 10

Q

Selecting multiple Rows and column in a DataFrame

Answer

A

df.loc[row, col]

df.loc[ [row1,row2,… ] , [col1,col2,… ] ]

df.loc[[‘A’,’B’],[‘W’,’Y’]]

Question 11

Q

Conditional Selection

Answer

A

An important feature of pandas is conditional selection using bracket notation, very similar
to numpy:

dataFrame[condition]

df[df>0]

df[df[‘col’] >0]

Question 12

Q

Selecting single and multiple columns

Answer

A

dataframe[ dataframe [“col_name”] < 5] [ ‘colname’]

df[df[‘W’]>0][‘Y’]

dataframe[ dataframe [“col_name”] < 5] [ [‘col1’,col2’] ]

df[df[‘W’]>0][[‘Y’,’X’]]

Question 13

Q

Selecting single and multiple columns with multiple conditions

Answer

A

For two conditions you can use | and & with parenthesis:

dataframe[ (condition1) & (condition2)]

df[(df[‘W’]>0) & (df[‘Y’] > 1)]

Question 14

Q

how to reset index

Answer

A

Reset to default 0,1…n index

dataframe.reset_index()

df.reset_index()

Question 15

Q

how to set specific column as index in a dataframe

Answer

A

Here we can give column name or we can provide pandas series
Dataframe.set_index(“col_name”)

df.set_index(‘States’)

df.set_index(‘States’,inplace=True) to change in permanently

Question 16

Q

how to create Multi-Index in a dataframe

Answer

A

zip the 2 list convert to list of tuples

first create 2 list outside and inside index
outside = [‘G1’,’G1’,’G1’,’G2’,’G2’,’G2’]
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
#set multiindex from tuple
hier_index = pd.MultiIndex.from_tuples(hier_index)

print(pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=[‘A’,’B’]))

A B
G1 1 -0.419202 -0.516732
2 0.036076 -0.205218
3 0.148782 0.942314
G2 1 0.001280 -1.956888
2 1.069982 -2.220442
3 1.180214 0.924161

Question 17

Q

How to fetch elements from the Multi-Index dataframe

Answer

A

For index hierarchy we use df.loc[], if this was on the
columns axis, you would just use normal bracket notation df[]. Calling one level of the index
returns the sub-dataframe:

df.loc[“G1”]

df.loc[‘G1’].loc[1]

Question 18

Q

how to drop missing values

Answer

A

df.dropna() This deletes from row wise

df.dropna(axis=1) this deletes from column wise

df.dropna(thresh=2) removes rows having 2 missing values

Question 19

Q

how to replace null values

Answer

A

df.fillna(value=’FILL VALUE’)

df[‘A’].fillna(value=df[‘A’].mean())

Question 20

Q

hello

Groupby

ok

this

Answer

A

data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
'Sales':[200,120,340,124,243,350]}

df.groupby('Company')

Now you can use the .groupby() method to group rows together based off of a column
name. For instance let’s group based off of Company. This will create a DataFrameGroupBy
object

Company Person Sales
GOOG   Sam 200 
GOOG   Charlie 120
MSFT   Amy 340
MSFT   Vanessa 124
FB   Carl 243
FB   Sarah 350

df.groupby("Company").mean()

And then call aggregate methods off the object. we have to use any one aggregate methods

Question 21

Q

what are the aggregate methods used in group by

Answer

A

std(), min(), max(), count(), describe()

Question 22

Q

how to you transpose the dataframe

Answer

A

dataFrame.transpose()

Question 23

Q

what are the ways to combine dataframes together

Answer

A

Merging, Joining, and Concatenating

Question 24

Q

how to concatenate dataframes

Answer

A

Concatenation basically glues together DataFrames. Keep in mind that dimensions should
match along the axis you are concatenating on. You can use pd.concat and pass in a list of
DataFrames to concatenate together

pd.concat([dataframe1], [dataframe2], [dataframe3], ..)

pd.concat([dataframe1], [dataframe2], [dataframe3], .., axis=1)

Question 25

Q

Merging multiple dataframes

Answer

A

The merge function allows you to merge DataFrames together using a similar logic as merging SQL Tables together. For example:

pd.merge(left,right, how = “inner”, on = “col_name”)

Question 26

Q

Answer

A

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})

pd.merge(left, right, on=['key1', 'key2'])

Here above it merges side by side

Question 27

Q

how to fetch the unique values and its count

Answer

A

dataframe[“col_name”].unique() –> fetching unique values

dataframe[“col_name”].nunique() –> fetching count of unique values

df[‘col2’].value_counts() –> fetching values and count

Question 28

Q

how to Select from DataFrame using criteria from multiple columns

Answer

A

dataframe[(dataframe[‘col1’]>2) & (dataframe[‘col2’]==444)]

df[(df[‘col1’]>2) & (df[‘col2’]==444)]

Question 29

Q

how to use apply functions in pandas

Answer

A

dataframe[“col_name”].apply(function_name)

every data in the column will apply this function

df[‘col3’].apply(len)

df[‘col3’].apply(lambda x: len(x))

Question 30

Q

how to Permanently Remove` a Column

Answer

A

del df[‘col_name’]

Question 31

Q

how to Get column and index names

Answer

A

dataFrame.columns
RangeIndex(start=0, stop=2, step=1)

dataframe.index
Index([‘col2’, ‘col3’], dtype=’object’)

Question 32

Q

how to Sorting and Ordering a DataFrame

Answer

A

dataframe.sort_values(by = “colname”)
#inplace=False by default

Question 33

Q

how to Find Null Values or Check for Null Values

Answer

A

dataframe.isnull()

Question 34

Q

how to drop null rows

Answer

A

df.dropna()

Question 35

Q

how to Fill in Null values with something else

Answer

A

df.fillna(“value”)

Question 36

Q

how to create pivot table

Answer

A

data = {‘A’:[‘foo’,’foo’,’foo’,’bar’,’bar’,’bar’],
‘B’:[‘one’,’one’,’two’,’two’,’one’,’one’],
‘C’:[‘x’,’y’,’x’,’y’,’x’,’y’],
‘D’:[1,3,2,5,4,1]}
df = pd.DataFrame(data)

A   B    C  D 0 foo one x 1 1 foo one y 3 2 foo two x 2 3 bar two y 5 4 bar one x 4 5 bar one y 1

df.pivot_table(values=’D’,index=[‘A’, ‘B’],columns=[‘C’])

	 C     x     y A      B bar  one 4.0 1.0
two NaN 5.0 foo one 1.0 3.0
two 2.0 NaN

Question 37

Q

importing csv_file and exporting csv file in pandas

Answer

A

pd.read_csv(file_name)

dataframe.to_csv(file_name, index=False)

Question 38

Q

importing excel_file and exporting excel file in pandas

Answer

A

Pandas can read and write excel files, keep in mind, this only imports data. Not formulas or
images, having images or macros may cause this read_excel method to crash.

pd.read_excel(‘Excel_Sample.xlsx’,sheetname=’Sheet1’)

dataframe.to_excel(‘Excel_Sample.xlsx’,sheet_name=’Sheet1’)

Question 39

Q

how to read html tables from the website

Answer

A

Pandas read_html function will read tables off of a webpage and return a list of DataFrame
objects:
df = pd.read_html(‘website link’)