Pandas Flashcards
Creating a Series
pd.series(list)
pd.Series(data=my_list,index=labels)
Creating series along with the index
ser1 = pd.Series([1,2,3,4],index = [‘USA’, ‘Germany’,’USSR’, ‘Japan’])
ser1
USA 1
Germany 2
USSR 3
Japan 4
dtype: int64
fetch elements in series
varname[index]
ser1[‘USA’]
Create a DataFrame
pd.DataFrame(data, index= , columns = )
pd.DataFrame(randn(5,4),index=’A B C D E’.split(),columns=’W X Y Z’.split())
W X Y Z A 2.706850 0.628133 0.907969 0.503826 B 0.651118 -0.319318 -0.848077 0.605965 C -2.018168 0.740122 0.528813 -0.589001 D 0.188695 -0.758872 -0.933237 0.955057 E 0.190794 1.978757 2.605967 0.683509
fetching column from a dataframe
dataframe[col_name] –> df[‘W’]
dataframe[[col1,col2 ]] –> df[[‘W’,’Z’]]
Creating a new column in a dataframe
df[‘new’] = df[‘W’] + df[‘Y’]
how to remove a column
df.drop(‘col_name’,axis=1,inplace=True)
how to remove a rows
df.drop(‘index’, axis=0)
Selecting Rows from a DataFrame
df.loc[“index”]
df.loc[“label”]
Selecting multiple Rows and column in a DataFrame
df.loc[row, col]
df.loc[ [row1,row2,… ] , [col1,col2,… ] ]
df.loc[[‘A’,’B’],[‘W’,’Y’]]
Conditional Selection
An important feature of pandas is conditional selection using bracket notation, very similar
to numpy:
dataFrame[condition]
df[df>0]
df[df[‘col’] >0]
Selecting single and multiple columns
dataframe[ dataframe [“col_name”] < 5] [ ‘colname’]
df[df[‘W’]>0][‘Y’]
dataframe[ dataframe [“col_name”] < 5] [ [‘col1’,col2’] ]
df[df[‘W’]>0][[‘Y’,’X’]]
Selecting single and multiple columns with multiple conditions
For two conditions you can use | and & with parenthesis:
dataframe[ (condition1) & (condition2)]
df[(df[‘W’]>0) & (df[‘Y’] > 1)]
how to reset index
Reset to default 0,1…n index
dataframe.reset_index()
df.reset_index()
how to set specific column as index in a dataframe
Here we can give column name or we can provide pandas series
Dataframe.set_index(“col_name”)
df.set_index(‘States’)
df.set_index(‘States’,inplace=True) to change in permanently
how to create Multi-Index in a dataframe
zip the 2 list convert to list of tuples
first create 2 list outside and inside index
outside = [‘G1’,’G1’,’G1’,’G2’,’G2’,’G2’]
inside = [1,2,3,1,2,3]
hier_index = list(zip(outside,inside))
#set multiindex from tuple
hier_index = pd.MultiIndex.from_tuples(hier_index)
print(pd.DataFrame(np.random.randn(6,2),index=hier_index,columns=[‘A’,’B’]))
A B
G1 1 -0.419202 -0.516732
2 0.036076 -0.205218
3 0.148782 0.942314
G2 1 0.001280 -1.956888
2 1.069982 -2.220442
3 1.180214 0.924161
How to fetch elements from the Multi-Index dataframe
For index hierarchy we use df.loc[], if this was on the
columns axis, you would just use normal bracket notation df[]. Calling one level of the index
returns the sub-dataframe:
df.loc[“G1”]
df.loc[‘G1’].loc[1]
how to drop missing values
df.dropna() This deletes from row wise
df.dropna(axis=1) this deletes from column wise
df.dropna(thresh=2) removes rows having 2 missing values
how to replace null values
df.fillna(value=’FILL VALUE’)
df[‘A’].fillna(value=df[‘A’].mean())
hello
Groupby
ok
this
data = {'Company':['GOOG','GOOG','MSFT','MSFT','FB','FB'], 'Person':['Sam','Charlie','Amy','Vanessa','Carl','Sarah'], 'Sales':[200,120,340,124,243,350]} df.groupby('Company')
Now you can use the .groupby() method to group rows together based off of a column
name. For instance let’s group based off of Company. This will create a DataFrameGroupBy
object
Company Person Sales GOOG Sam 200 GOOG Charlie 120 MSFT Amy 340 MSFT Vanessa 124 FB Carl 243 FB Sarah 350
df.groupby("Company").mean()
And then call aggregate methods off the object. we have to use any one aggregate methods
what are the aggregate methods used in group by
std(), min(), max(), count(), describe()
how to you transpose the dataframe
dataFrame.transpose()
what are the ways to combine dataframes together
Merging, Joining, and Concatenating
how to concatenate dataframes
Concatenation basically glues together DataFrames. Keep in mind that dimensions should
match along the axis you are concatenating on. You can use pd.concat and pass in a list of
DataFrames to concatenate together
pd.concat([dataframe1], [dataframe2], [dataframe3], ..)
pd.concat([dataframe1], [dataframe2], [dataframe3], .., axis=1)
Merging multiple dataframes
The merge function allows you to merge DataFrames together using a similar logic as merging SQL Tables together. For example:
pd.merge(left,right, how = “inner”, on = “col_name”)
left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'], 'key2': ['K0', 'K1', 'K0', 'K1'], 'A': ['A0', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3']}) right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'], 'key2': ['K0', 'K0', 'K0', 'K0'], 'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3']}) pd.merge(left, right, on=['key1', 'key2'])
Here above it merges side by side
how to fetch the unique values and its count
dataframe[“col_name”].unique() –> fetching unique values
dataframe[“col_name”].nunique() –> fetching count of unique values
df[‘col2’].value_counts() –> fetching values and count
how to Select from DataFrame using criteria from multiple columns
dataframe[(dataframe[‘col1’]>2) & (dataframe[‘col2’]==444)]
df[(df[‘col1’]>2) & (df[‘col2’]==444)]
how to use apply functions in pandas
dataframe[“col_name”].apply(function_name)
every data in the column will apply this function
df[‘col3’].apply(len)
df[‘col3’].apply(lambda x: len(x))
how to Permanently Remove` a Column
del df[‘col_name’]
how to Get column and index names
dataFrame.columns
RangeIndex(start=0, stop=2, step=1)
dataframe.index
Index([‘col2’, ‘col3’], dtype=’object’)
how to Sorting and Ordering a DataFrame
dataframe.sort_values(by = “colname”)
#inplace=False by default
how to Find Null Values or Check for Null Values
dataframe.isnull()
how to drop null rows
df.dropna()
how to Fill in Null values with something else
df.fillna(“value”)
how to create pivot table
data = {‘A’:[‘foo’,’foo’,’foo’,’bar’,’bar’,’bar’],
‘B’:[‘one’,’one’,’two’,’two’,’one’,’one’],
‘C’:[‘x’,’y’,’x’,’y’,’x’,’y’],
‘D’:[1,3,2,5,4,1]}
df = pd.DataFrame(data)
A B C D 0 foo one x 1 1 foo one y 3 2 foo two x 2 3 bar two y 5 4 bar one x 4 5 bar one y 1
df.pivot_table(values=’D’,index=[‘A’, ‘B’],columns=[‘C’])
C x y A B bar one 4.0 1.0 two NaN 5.0 foo one 1.0 3.0 two 2.0 NaN
importing csv_file and exporting csv file in pandas
pd.read_csv(file_name)
dataframe.to_csv(file_name, index=False)
importing excel_file and exporting excel file in pandas
Pandas can read and write excel files, keep in mind, this only imports data. Not formulas or
images, having images or macros may cause this read_excel method to crash.
pd.read_excel(‘Excel_Sample.xlsx’,sheetname=’Sheet1’)
dataframe.to_excel(‘Excel_Sample.xlsx’,sheet_name=’Sheet1’)
how to read html tables from the website
Pandas read_html function will read tables off of a webpage and return a list of DataFrame
objects:
df = pd.read_html(‘website link’)