Pandas Flashcards
To merge csv tables from scrape in Pandas
DataframeName = pd.concat([df, df2, df3], axis=1, sort=False)
To find if there are duplicate rows
df.column name.duplicated()
Displays the duplicate rows
df.loc[df.duplicated(), :]
To mark duplicates except for the first occurrence
df.loc[df.duplicated(keep = ‘first’), :]
To mark duplicates except for the last occurrence
df.loc[df.duplicated(keep = ‘last’), :]
To mark all duplicates as True (all will be displayed)
df.loc[df.duplicated(keep = False), :]
To drop duplicates from the data frame
df.drop_duplicates(keep=’first’).shape
Save file in Numpy
Load file in Numpy
Arr = np.arrange(10)
np. save(‘file_name’, arr)
np. load(‘file_name.npy’)
Drop row(s)
DataframeName.drop([‘row name’, ‘row name’])
Transpose Datafram (swap rows and columns)
DataframeName.T
Add 2 dfs, and keep values where rows and columns dont match.
df1.add(df2, fill_value = 0)
Creating DF with 12 count, 4 rows, 3 columns, A, B,C as column nanes, and 4 states as index.
df = pd.Dataframe(np.arrange(12.).reshape ((4,3), columns = list (‘ABC’), index = [‘New York’, ‘Florida’, ‘California’, ‘Nevada’])
Create own dataframe
df = pd.dataframe({‘A’ : [0, 1, 2, 3, 4], ‘B’: [5, 6, 7, 8], ‘C’: [9, 10, 11, 12]})
Load csv into Pandas and create header row
“pd.read_csv(‘examples/ex2.csv’, names=[‘a’, ‘b’, ‘c’, ‘d’, ‘message’])”
To check for duplicate rows
data.duplicated()