Pandas Flashcards
Creating a Series
pd.series(list)
pd.Series(data=my_list,index=labels)
Creating series along with the index
ser1 = pd.Series([1,2,3,4],index = [‘USA’, ‘Germany’,’USSR’, ‘Japan’])
ser1
USA 1
Germany 2
USSR 3
Japan 4
dtype: int64
fetch elements in series
varname[index]
ser1[‘USA’]
Create a DataFrame
pd.DataFrame(data, index= , columns = )
pd.DataFrame(randn(5,4),index=’A B C D E’.split(),columns=’W X Y Z’.split())
W X Y Z A 2.706850 0.628133 0.907969 0.503826 B 0.651118 -0.319318 -0.848077 0.605965 C -2.018168 0.740122 0.528813 -0.589001 D 0.188695 -0.758872 -0.933237 0.955057 E 0.190794 1.978757 2.605967 0.683509
fetching column from a dataframe
dataframe[col_name] –> df[‘W’]
dataframe[[col1,col2 ]] –> df[[‘W’,’Z’]]
Creating a new column in a dataframe
df[‘new’] = df[‘W’] + df[‘Y’]
how to remove a column
df.drop(‘col_name’,axis=1,inplace=True)
how to remove a rows
df.drop(‘index’, axis=0)
Selecting Rows from a DataFrame
df.loc[“index”]
df.loc[“label”]
Selecting multiple Rows and column in a DataFrame
df.loc[row, col]
df.loc[ [row1,row2,… ] , [col1,col2,… ] ]
df.loc[[‘A’,’B’],[‘W’,’Y’]]
Conditional Selection
An important feature of pandas is conditional selection using bracket notation, very similar
to numpy:
dataFrame[condition]
df[df>0]
df[df[‘col’] >0]
Selecting single and multiple columns
dataframe[ dataframe [“col_name”] < 5] [ ‘colname’]
df[df[‘W’]>0][‘Y’]
dataframe[ dataframe [“col_name”] < 5] [ [‘col1’,col2’] ]
df[df[‘W’]>0][[‘Y’,’X’]]
Selecting single and multiple columns with multiple conditions
For two conditions you can use | and & with parenthesis:
dataframe[ (condition1) & (condition2)]
df[(df[‘W’]>0) & (df[‘Y’] > 1)]
how to reset index
Reset to default 0,1…n index
dataframe.reset_index()
df.reset_index()
how to set specific column as index in a dataframe
Here we can give column name or we can provide pandas series
Dataframe.set_index(“col_name”)
df.set_index(‘States’)
df.set_index(‘States’,inplace=True) to change in permanently