Merging Dataframes + handle null values + Aggregation Flashcards

1
Q

How do you merge two DataFrames in pandas?

A

You can merge two DataFrames using the pd.merge() function. For example, newBios2 = pd.merge(newBios, noc, left_on='born_country', right_on='NOC', how='left', suffixes=['', '_nocdf']) merges newBios and noc DataFrames on the born_country and NOC columns, respectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of the suffixes parameter in pd.merge()?

A

The suffixes parameter is used to differentiate columns with the same name in the two DataFrames being merged. For example, suffixes=['', '_nocdf'] will add _nocdf to the columns from the right DataFrame that have the same name as columns in the left DataFrame.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can you handle null values in a DataFrame?

A

You can handle null values using methods like fillna(), interpolate(), and dropna(). For example, coffee.fillna(coffee['Units Sold'].mean()) replaces null values with the mean of the column, while coffee.dropna() removes rows with null values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does the interpolate() method do?

A

The interpolate() method fills null values by identifying a pattern in the column. It is useful when you want to fill missing values based on the values of neighboring rows. For example, coffee['Units Sold'].interpolate() fills null values in the ‘Units Sold’ column by interpolation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you aggregate data in pandas?

A

You can aggregate data using the groupby() method followed by an aggregation function like sum(), mean(), etc. For example, coffee.groupby(['Day', 'Coffee Type']).agg({'Units Sold': 'sum'}) groups the data by ‘Day’ and ‘Coffee Type’ and calculates the sum of ‘Units Sold’ for each group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the purpose of the agg() function in pandas?

A

The agg() function is used to apply one or more aggregation functions to a DataFrame or a grouped DataFrame. For example, coffee.groupby('Coffee Type').agg({'Units Sold': 'mean'}) calculates the mean of ‘Units Sold’ for each ‘Coffee Type’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you pivot a DataFrame in pandas?

A

You can pivot a DataFrame using the pivot() method. For example, pivot = coffee.pivot(columns='Coffee Type', index='Day', values='Units Sold') creates a pivot table where ‘Coffee Type’ becomes the columns, ‘Day’ becomes the index, and ‘Units Sold’ are the values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the pivot() method do?

A

The pivot() method reshapes the DataFrame by turning unique values in a column into new columns. It is useful for creating a summary table where you want to see the relationship between two variables. For example, pivot = coffee.pivot(columns='Coffee Type', index='Day', values='Units Sold') creates a pivot table showing ‘Units Sold’ for each ‘Coffee Type’ and ‘Day’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you calculate the sum of values in a pivot table?

A

You can calculate the sum of values in a pivot table using the sum() method. For example, pivot.sum() calculates the sum of values for each column, while pivot.sum(axis=1) calculates the sum of values for each row.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you access a specific value in a pivot table?

A

You can access a specific value in a pivot table using the loc[] method. For example, pivot.loc['Monday', 'Latte'] retrieves the value for ‘Latte’ on ‘Monday’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you load data from a CSV file into a DataFrame?

A

Use pd.read_csv(‘file_path.csv’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does coffee.loc[3] do?

A

Returns the row at index position 3 in the DataFrame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to select multiple rows by index using loc?

A

coffee.loc[[2, 3, 5]]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What’s the difference between loc and iloc?

A

loc uses labels/names, iloc uses numerical indexes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to slice rows and select specific columns using loc?

A

coffee.loc[3:8, ‘Day’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does sort_values() do?

A

Sorts DataFrame by specified column values

17
Q

How to sort by multiple columns with different orders?

A

df.sort_values([col1, col2], ascending=[False, True])

18
Q

What’s the purpose of iterrows()?

A

Iterates through DataFrame rows as (index, Series) pairs

19
Q

How to merge two DataFrames?

A

pd.merge(df1, df2, on=’common_column’, how=’join_type’)

20
Q

What’s the difference between inner and left joins?

A

Inner keeps only matching rows, left keeps all from left DataFrame

21
Q

How to handle missing values using fillna()?

A

df.fillna(value) replaces nulls with specified value

22
Q

What does interpolate() do?

A

Fills null values by estimating missing values between existing data points

23
Q

How to drop rows with missing values?

A

df.dropna()

24
Q

What’s the purpose of groupby()?

A

Groups data by specified columns for aggregation

25
How to calculate mean sales per coffee type?
coffee.groupby('Coffee Type')['Units Sold'].mean()
26
What's the difference between sum() and mean() in aggregation?
Sum calculates total, mean calculates average
27
How to create a pivot table in Pandas?
df.pivot(index='rows', columns='columns', values='values')
28
What does pivot.sum() calculate?
Column-wise totals of the pivot table
29
How to get row-wise totals in a pivot?
pivot.sum(axis=1)
30
What's the purpose of the suffixes parameter in merge()?
Adds suffixes to duplicate column names from merged DataFrames
31
How to rename columns after merging?
Use df.rename(columns={'old':'new'}, inplace=True)
32
What does coffee.isna().sum() show?
Count of null values in each column
33
How to filter rows where 'Units Sold' is null?
coffee[coffee['Units Sold'].isna()]
34
What's the difference between fillna(mean()) and interpolate()?
Mean fills with average, interpolate estimates based on neighbors
35
How to aggregate multiple statistics at once?
df.agg({'col': ['mean', 'sum', 'max']})
36
What's the output of coffee['Units Sold'].unique()?
Array of unique values in that column
37
How to sort a DataFrame by index?
df.sort_index()