Pandas Flashcards

1
Q

Creating a Series

A

pd.series(list)

pd.Series(data=my_list,index=labels)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Creating series along with the index

A

ser1 = pd.Series([1,2,3,4],index = [‘USA’, ‘Germany’,’USSR’, ‘Japan’])

ser1

USA 1
Germany 2
USSR 3
Japan 4

dtype: int64

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

fetch elements in series

A

varname[index]

ser1[‘USA’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Create a DataFrame

A

pd.DataFrame(data, index= , columns = )

pd.DataFrame(randn(5,4),index=’A B C D E’.split(),columns=’W X Y Z’.split())

          W                X                     Y                    Z A     2.706850     0.628133     0.907969     0.503826 B     0.651118     -0.319318    -0.848077    0.605965 C     -2.018168     0.740122     0.528813    -0.589001 D     0.188695     -0.758872     -0.933237   0.955057 E     0.190794     1.978757       2.605967    0.683509
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

fetching column from a dataframe

A

dataframe[col_name] –> df[‘W’]

dataframe[[col1,col2 ]] –> df[[‘W’,’Z’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Creating a new column in a dataframe

A

df[‘new’] = df[‘W’] + df[‘Y’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how to remove a column

A

df.drop(‘col_name’,axis=1,inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how to remove a rows

A

df.drop(‘index’, axis=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Selecting Rows from a DataFrame

A

df.loc[“index”]

df.loc[“label”]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Selecting multiple Rows and column in a DataFrame

A

df.loc[row, col]

df.loc[ [row1,row2,… ] , [col1,col2,… ] ]

df.loc[[‘A’,’B’],[‘W’,’Y’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Conditional Selection

A

An important feature of pandas is conditional selection using bracket notation, very similar
to numpy:

dataFrame[condition]

df[df>0]

df[df[‘col’] >0]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Selecting single and multiple columns

A

dataframe[ dataframe [“col_name”] < 5] [ ‘colname’]

df[df[‘W’]>0][‘Y’]

dataframe[ dataframe [“col_name”] < 5] [ [‘col1’,col2’] ]

df[df[‘W’]>0][[‘Y’,’X’]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Selecting single and multiple columns with multiple conditions

A

For two conditions you can use | and & with parenthesis:

dataframe[ (condition1) & (condition2)]

df[(df[‘W’]>0) & (df[‘Y’] > 1)]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how to reset index

A

Reset to default 0,1…n index

dataframe.reset_index()

df.reset_index()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

how to set specific column as index in a dataframe

A

Here we can give column name or we can provide pandas series
Dataframe.set_index(“col_name”)

df.set_index(‘States’)

df.set_index(‘States’,inplace=True) to change in permanently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

how to create Multi-Index in a dataframe

A

zip the 2 list convert to list of tuples

first create 2 list outside and inside index
outside = [‘G1’,’G1’,’G1’,’G2’,’G2’,’G2’]
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
#set multiindex from tuple
hier_index = pd.MultiIndex.from_tuples(hier_index)

print(pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=[‘A’,’B’]))

A B
G1 1 -0.419202 -0.516732
2 0.036076 -0.205218
3 0.148782 0.942314
G2 1 0.001280 -1.956888
2 1.069982 -2.220442
3 1.180214 0.924161

17
Q

How to fetch elements from the Multi-Index dataframe

A

For index hierarchy we use df.loc[], if this was on the
columns axis, you would just use normal bracket notation df[]. Calling one level of the index
returns the sub-dataframe:

df.loc[“G1”]

df.loc[‘G1’].loc[1]

18
Q

how to drop missing values

A

df.dropna() This deletes from row wise

df.dropna(axis=1) this deletes from column wise

df.dropna(thresh=2) removes rows having 2 missing values

19
Q

how to replace null values

A

df.fillna(value=’FILL VALUE’)

df[‘A’].fillna(value=df[‘A’].mean())

20
Q

hello

Groupby

ok

this

A
data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'],
'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'],
'Sales':[200,120,340,124,243,350]}

df.groupby('Company')

Now you can use the .groupby() method to group rows together based off of a column
name. For instance let’s group based off of Company. This will create a DataFrameGroupBy
object

Company Person Sales
GOOG   Sam 200 
GOOG   Charlie 120
MSFT   Amy 340
MSFT   Vanessa 124
FB   Carl 243
FB   Sarah 350

df.groupby("Company").mean()

And then call aggregate methods off the object. we have to use any one aggregate methods

21
Q

what are the aggregate methods used in group by

A

std(), min(), max(), count(), describe()

22
Q

how to you transpose the dataframe

A

dataFrame.transpose()

23
Q

what are the ways to combine dataframes together

A

Merging, Joining, and Concatenating

24
Q

how to concatenate dataframes

A

Concatenation basically glues together DataFrames. Keep in mind that dimensions should
match along the axis you are concatenating on. You can use pd.concat and pass in a list of
DataFrames to concatenate together

pd.concat([dataframe1], [dataframe2], [dataframe3], ..)

pd.concat([dataframe1], [dataframe2], [dataframe3], .., axis=1)

25
Q

Merging multiple dataframes

A

The merge function allows you to merge DataFrames together using a similar logic as merging SQL Tables together. For example:

pd.merge(left,right, how = “inner”, on = “col_name”)

26
Q
A
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})

pd.merge(left, right, on=['key1', 'key2'])

Here above it merges side by side

27
Q

how to fetch the unique values and its count

A

dataframe[“col_name”].unique() –> fetching unique values

dataframe[“col_name”].nunique() –> fetching count of unique values

df[‘col2’].value_counts() –> fetching values and count

28
Q

how to Select from DataFrame using criteria from multiple columns

A

dataframe[(dataframe[‘col1’]>2) & (dataframe[‘col2’]==444)]

df[(df[‘col1’]>2) & (df[‘col2’]==444)]

29
Q

how to use apply functions in pandas

A

dataframe[“col_name”].apply(function_name)

every data in the column will apply this function

df[‘col3’].apply(len)

df[‘col3’].apply(lambda x: len(x))

30
Q

how to Permanently Remove` a Column

A

del df[‘col_name’]

31
Q

how to Get column and index names

A

dataFrame.columns
RangeIndex(start=0, stop=2, step=1)

dataframe.index
Index([‘col2’, ‘col3’], dtype=’object’)

32
Q

how to Sorting and Ordering a DataFrame

A

dataframe.sort_values(by = “colname”)
#inplace=False by default

33
Q

how to Find Null Values or Check for Null Values

A

dataframe.isnull()

34
Q

how to drop null rows

A

df.dropna()

35
Q

how to Fill in Null values with something else

A

df.fillna(“value”)

36
Q

how to create pivot table

A

data = {‘A’:[‘foo’,’foo’,’foo’,’bar’,’bar’,’bar’],
‘B’:[‘one’,’one’,’two’,’two’,’one’,’one’],
‘C’:[‘x’,’y’,’x’,’y’,’x’,’y’],
‘D’:[1,3,2,5,4,1]}
df = pd.DataFrame(data)

A   B    C  D 0 foo one x 1 1 foo one y 3 2 foo two x 2 3 bar two y 5 4 bar one x 4 5 bar one y 1

df.pivot_table(values=’D’,index=[‘A’, ‘B’],columns=[‘C’])

	 C     x     y A      B bar  one 4.0 1.0
two NaN 5.0 foo one 1.0 3.0
two 2.0 NaN
37
Q

importing csv_file and exporting csv file in pandas

A

pd.read_csv(file_name)

dataframe.to_csv(file_name, index=False)

38
Q

importing excel_file and exporting excel file in pandas

A

Pandas can read and write excel files, keep in mind, this only imports data. Not formulas or
images, having images or macros may cause this read_excel method to crash.

pd.read_excel(‘Excel_Sample.xlsx’,sheetname=’Sheet1’)

dataframe.to_excel(‘Excel_Sample.xlsx’,sheet_name=’Sheet1’)

39
Q

how to read html tables from the website

A

Pandas read_html function will read tables off of a webpage and return a list of DataFrame
objects:
df = pd.read_html(‘website link’)