Pandas Basics Flashcards

1
Q

Import library

pandas

A
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Import a csv into data frame

pandas

A
file = "file.csv"
df = pd.read_csv(file)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Export a data frame to csv

pandas

A
df.to_csv("file.csv", sep = "|", index = F
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Creating a data frame from a list of lists

pandas

A
data = [[1, 2, "A"], [3, 4, "B"]]
df = pd.DataFrame(data, 
           columns = ["col1", "col2", "col3"])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Creating a data frame from a dictionary

pandas

A
data = {'col1': [1, 2], 
        'col2': [3, 4], 
        'col3': ["A", "B"]}

df = pd.DataFrame(data=data)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Get number of rows and columns in a data frame

pandas

A
df.shape
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Viewing top n rows

pandas

A
df.head(n)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Displaying data type of columns

pandas

A
df.dtypes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Modifying the data type of a column

pandas

A
df["col1"] = df["col1"].astype(np.int8)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Display missing value stats and data type

pandas

A
df.info()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Print descriptive stats

pandas

A
df.describe()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Filling missing values with a specific value

pandas

A
df.fillna(0, inplace = True)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Combining data frames: join (merge)

pandas

A
pd.merge(df1, df2, on = "col3")
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sorting a data frame

pandas

2 alternatives

A
df.sort_values("col1"))
df.sort_values(by='Sales', ascending=False)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Grouping a data frame

pandas

2 alternatives

A
df.groupby('Region')['Sales'].mean()
df.groupby("col3").agg({"col1":sum, "col2":max})
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Renaming columns

pandas

A
df.rename(columns = {"col_A":"col1"})
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Deleting columns

pandas

A
df.drop(columns = ["col1"])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Adding columns (addition method)

pandas

A
df["col3"] = df["col1"] + df["col2"]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Adding columns (assingment method)

pandas

A
df = df.assign(col3 = df["col1"] + df["col2"])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Filtering rows: boolean method

pandas

A
dfx[['b', 'c']]
df[df["col2"] > 5]
df[(df['Region'] == 'North') & (df['Sales'] > 100)]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Filtering rows: from list

pandas

A
filter_list = ["A", "C"]
df[df["col3"].isin(filter_list)]
22
Q

Filtering by position

pandas

A
dfx.iloc[1] #Select single row
dfx.iloc[:,1] #Select single column
dfx.iloc[1,1] #Select single cell
dfx.iloc[:2,:2] #Select group of cells
dfx.iloc[1:,1:] #Select group of cells
23
Q

Filtering: selecting by index

pandas

A
dfx.loc[1] #Select single row
dfx.loc[:,'b'] #Select single column
dfx.loc[1,'b'] #Select single cell
dfx.loc[:2,['b', 'c']]  #Select group of cells



dfx.loc['hola'] #Select single row
dfx.loc[:,'c'] #Select single column
dfx.loc['hola','b'] #Select single cell
dfx.loc[:'hola',['b', 'c']]  #Select group of cells


data.loc[data['condition']]
24
Q

Set/reset index

pandas

A
dfx.set_index('d', inplace=True)
dfx.reset_index()
25
Q

Finding unique values (list, count)

pandas

A
df["col3"].unique()
df["col3"].nunique()
26
Q

Apply a function to a data frame

pandas

A
def add_cols(row):
    return row.col1 + row.col2
df["col3"] = df.apply(add_cols, axis=1)
27
Q

Apply a function to a single column

pandas

A
def square_col(num):
    return num**2
df["col3"] = df.col1.apply(square_col)
OR
data['new_column'] = data['old_column'].apply(lambda x: x * 2)
28
Q

Mark duplicated rows

pandas

A
df.duplicated(keep=False)
29
Q

Drop duplicated rows

pandas

A
df.drop_duplicates()
30
Q

Frequency distribution

pandas

A
df.value_counts("col2")
31
Q

Reset the index, drop the old index

A
print(df.reset_index())
df.reset_index(drop=True)
32
Q

Crosstbulation

pandas

A
pd.crosstab(df.col1, df.col2)
33
Q

Pivoting a dataset (to wide format)

pandas

A
pd.pivot_table(df, 
               index = ["Name"],
               columns=["Subject"], 
               values='Marks',
               fill_value=0)
34
Q

Get the type of an object

pandas

A
type(df)
35
Q

Drop rows with missing values

pandas

A
df.dropna()
36
Q

Apply a lambda function

pandas

A
df['Sales'].apply(lambda x: x * 2)
37
Q

Combining data frames: append

pandas

A
df2 = pd.concat([df, df])
38
Q

Get number of row and columns

A
df.shape
39
Q

Delete a dataframe

A
del df
del(df)
40
Q

Add a caption to a dataframe

A

caption = ‘This is a caption’

df.style.set_caption(caption)

41
Q

Import from Excel

A

From Excel

data = pd.read_excel(‘data.xlsx’)

42
Q

Import from SQL

A

import sqlite3
conn = sqlite3.connect(‘database.db’)
data = pd.read_sql_query(‘SELECT * FROM table_name’, conn)

43
Q

Drop rows with missing values

A

data.dropna()

44
Q

Trim outliers

A

Q1 = data[‘column’].quantile(0.25)
Q3 = data[‘column’].quantile(0.75)
IQR = Q3 - Q1
data = data[(data[‘column’] >= Q1 - 1.5 * IQR) & (data[‘column’] <= Q3 + 1.5 * IQR)]

45
Q

Save data to csv

A

data.to_csv(‘processed_data.csv’, index=False)

46
Q

Manipulate dates

A

data[‘date_column’] = pd.to_datetime(data[‘date_column’])
data[‘month’] = data[‘date_column’].dt.month

47
Q

Merging dataframes

A

merged_data = pd.concat([data1, data2], axis=0)

48
Q

Pivot table

A

pd.pivot_table(data, values=’value’, index=’category’, columns=’date’, aggfunc=np.sum)

49
Q

Random sample of data

A

sample = data.sample(n=100)

50
Q

Merging data frame based on common column

A

merged_df = pd.merge(df1, df2, on=’ID’)

51
Q

Joining based on index

A

result = df1.join(df2)