Pandas Basics Flashcards
Import library
pandas
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns
Import a csv into data frame
pandas
file = "file.csv" df = pd.read_csv(file)
Export a data frame to csv
pandas
df.to_csv("file.csv", sep = "|", index = F
Creating a data frame from a list of lists
pandas
data = [[1, 2, "A"], [3, 4, "B"]] df = pd.DataFrame(data, columns = ["col1", "col2", "col3"])
Creating a data frame from a dictionary
pandas
data = {'col1': [1, 2], 'col2': [3, 4], 'col3': ["A", "B"]} df = pd.DataFrame(data=data)
Get number of rows and columns in a data frame
pandas
df.shape
Viewing top n rows
pandas
df.head(n)
Displaying data type of columns
pandas
df.dtypes
Modifying the data type of a column
pandas
df["col1"] = df["col1"].astype(np.int8)
Display missing value stats and data type
pandas
df.info()
Print descriptive stats
pandas
df.describe()
Filling missing values with a specific value
pandas
df.fillna(0, inplace = True)
Combining data frames: join (merge)
pandas
pd.merge(df1, df2, on = "col3")
Sorting a data frame
pandas
2 alternatives
df.sort_values("col1")) df.sort_values(by='Sales', ascending=False)
Grouping a data frame
pandas
2 alternatives
df.groupby('Region')['Sales'].mean() df.groupby("col3").agg({"col1":sum, "col2":max})
Renaming columns
pandas
df.rename(columns = {"col_A":"col1"})
Deleting columns
pandas
df.drop(columns = ["col1"])
Adding columns (addition method)
pandas
df["col3"] = df["col1"] + df["col2"]
Adding columns (assingment method)
pandas
df = df.assign(col3 = df["col1"] + df["col2"])
Filtering rows: boolean method
pandas
dfx[['b', 'c']] df[df["col2"] > 5] df[(df['Region'] == 'North') & (df['Sales'] > 100)]