Pandas Basics Flashcards

Question 1

Q

Import library

pandas

Answer

A

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Question 2

Q

Import a csv into data frame

pandas

Answer

A

file = "file.csv"
df = pd.read_csv(file)

Question 3

Q

Export a data frame to csv

pandas

Answer

A

df.to_csv("file.csv", sep = "|", index = F

Question 4

Q

Creating a data frame from a list of lists

pandas

Answer

A

data = [[1, 2, "A"], [3, 4, "B"]]
df = pd.DataFrame(data, 
           columns = ["col1", "col2", "col3"])

Question 5

Q

Creating a data frame from a dictionary

pandas

Answer

A

data = {'col1': [1, 2], 
        'col2': [3, 4], 
        'col3': ["A", "B"]}

df = pd.DataFrame(data=data)

Question 6

Q

Get number of rows and columns in a data frame

pandas

Question 7

Q

Viewing top n rows

pandas

Answer

A

df.head(n)

Question 8

Q

Displaying data type of columns

pandas

Answer

A

df.dtypes

Question 9

Q

Modifying the data type of a column

pandas

Answer

A

df["col1"] = df["col1"].astype(np.int8)

Question 10

Q

Display missing value stats and data type

pandas

Answer

A

df.info()

Question 11

Q

Print descriptive stats

pandas

Answer

A

df.describe()

Question 12

Q

Filling missing values with a specific value

pandas

Answer

A

df.fillna(0, inplace = True)

Question 13

Q

Combining data frames: join (merge)

pandas

Answer

A

pd.merge(df1, df2, on = "col3")

Question 14

Q

Sorting a data frame

pandas

2 alternatives

Answer

A

df.sort_values("col1"))
df.sort_values(by='Sales', ascending=False)

Question 15

Q

Grouping a data frame

pandas

2 alternatives

Answer

A

df.groupby('Region')['Sales'].mean()
df.groupby("col3").agg({"col1":sum, "col2":max})

Question 16

Q

Renaming columns

pandas

Answer

A

df.rename(columns = {"col_A":"col1"})

Question 17

Q

Deleting columns

pandas

Answer

A

df.drop(columns = ["col1"])

Question 18

Q

Adding columns (addition method)

pandas

Answer

A

df["col3"] = df["col1"] + df["col2"]

Question 19

Q

Adding columns (assingment method)

pandas

Answer

A

df = df.assign(col3 = df["col1"] + df["col2"])

Question 20

Q

Filtering rows: boolean method

pandas

Answer

A

dfx[['b', 'c']]
df[df["col2"] > 5]
df[(df['Region'] == 'North') & (df['Sales'] > 100)]

Question 21

Q

Filtering rows: from list

pandas

Answer

A

filter_list = ["A", "C"]
df[df["col3"].isin(filter_list)]

Question 22

Q

Filtering by position

pandas

Answer

A

dfx.iloc[1] #Select single row
dfx.iloc[:,1] #Select single column
dfx.iloc[1,1] #Select single cell
dfx.iloc[:2,:2] #Select group of cells
dfx.iloc[1:,1:] #Select group of cells

Question 23

Q

Filtering: selecting by index

pandas

Answer

A

dfx.loc[1] #Select single row
dfx.loc[:,'b'] #Select single column
dfx.loc[1,'b'] #Select single cell
dfx.loc[:2,['b', 'c']]  #Select group of cells



dfx.loc['hola'] #Select single row
dfx.loc[:,'c'] #Select single column
dfx.loc['hola','b'] #Select single cell
dfx.loc[:'hola',['b', 'c']]  #Select group of cells


data.loc[data['condition']]

Question 24

Q

Set/reset index

pandas

Answer

A

dfx.set_index('d', inplace=True)
dfx.reset_index()

Question 25

Q

Finding unique values (list, count)

pandas

Answer

A

df["col3"].unique()
df["col3"].nunique()

Question 26

Q

Apply a function to a data frame

pandas

Answer

A

def add_cols(row):
    return row.col1 + row.col2
df["col3"] = df.apply(add_cols, axis=1)

Question 27

Q

Apply a function to a single column

pandas

Answer

A

def square_col(num):
    return num**2
df["col3"] = df.col1.apply(square_col)
OR
data['new_column'] = data['old_column'].apply(lambda x: x * 2)

Question 28

Q

Mark duplicated rows

pandas

Answer

A

df.duplicated(keep=False)

Question 29

Q

Drop duplicated rows

pandas

Answer

A

df.drop_duplicates()

Question 30

Q

Frequency distribution

pandas

Answer

A

df.value_counts("col2")

Question 31

Q

Reset the index, drop the old index

Answer

A

print(df.reset_index())
df.reset_index(drop=True)

Question 32

Q

Crosstbulation

pandas

Answer

A

pd.crosstab(df.col1, df.col2)

Question 33

Q

Pivoting a dataset (to wide format)

pandas

Answer

A

pd.pivot_table(df, 
               index = ["Name"],
               columns=["Subject"], 
               values='Marks',
               fill_value=0)

Question 34

Q

Get the type of an object

pandas

Question 35

Q

Drop rows with missing values

pandas

Answer

A

df.dropna()

Question 36

Q

Apply a lambda function

pandas

Answer

A

df['Sales'].apply(lambda x: x * 2)

Question 37

Q

Combining data frames: append

pandas

Answer

A

df2 = pd.concat([df, df])

Question 38

Q

Get number of row and columns

Question 39

Q

Delete a dataframe

Answer

A

del df
del(df)

Question 40

Q

Add a caption to a dataframe

Answer

A

caption = ‘This is a caption’

df.style.set_caption(caption)

Question 41

Q

Import from Excel

Answer

A

From Excel

data = pd.read_excel(‘data.xlsx’)

Question 42

Q

Import from SQL

Answer

A

import sqlite3
conn = sqlite3.connect(‘database.db’)
data = pd.read_sql_query(‘SELECT * FROM table_name’, conn)

Question 43

Q

Drop rows with missing values

Answer

A

data.dropna()

Question 44

Q

Trim outliers

Answer

A

Q1 = data[‘column’].quantile(0.25)
Q3 = data[‘column’].quantile(0.75)
IQR = Q3 - Q1
data = data[(data[‘column’] >= Q1 - 1.5 * IQR) & (data[‘column’] <= Q3 + 1.5 * IQR)]

Question 45

Q

Save data to csv

Answer

A

data.to_csv(‘processed_data.csv’, index=False)

Question 46

Q

Manipulate dates

Answer

A

data[‘date_column’] = pd.to_datetime(data[‘date_column’])
data[‘month’] = data[‘date_column’].dt.month

Question 47

Q

Merging dataframes

Answer

A

merged_data = pd.concat([data1, data2], axis=0)

Question 48

Q

Pivot table

Answer

A

pd.pivot_table(data, values=’value’, index=’category’, columns=’date’, aggfunc=np.sum)

Question 49

Q

Random sample of data

Answer

A

sample = data.sample(n=100)

Question 50

Q

Merging data frame based on common column

Answer

A

merged_df = pd.merge(df1, df2, on=’ID’)

Question 51

Q

Joining based on index

Answer

A

result = df1.join(df2)