Pandas Flashcards

1
Q

How do you import pandas?

A
import pandas as pd
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you load a CSV file into a dataframe?

A
df = pd.read_csv('data.csv')
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you load a JSON file into a dataframe?

A
df = pd.read_json('data.json')
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you view the first five rows of a dataframe?

A
df.head()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you view the last five rows of a dataframe?

A
df.tail()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you view information about the data?

A
df.info()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you return a new dataframe with no empty cells?

A
df_new = df.dropna()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you return the same dataframe with no empty cells?

A
df.dropna(inplace = True)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you replace empty cells with a value in the entire dataframe?

A
df.fillna("value", inplace = True)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you replace empty cells in specified column(s) with a value?

A
df["column"].fillna("value", inplace = True)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you calculate the MEAN of a column?

A
x = df["column"].mean()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you calculate the MEDIAN of a column?

A
x = df["column"].median()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you calculate the MODE of a column?

A
x = df["column"].mode()[0]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you convert a column to datetime?

A
df["date"] = pd.to_datetime(df["date"])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you remove NULL rows using specific column(s) as a reference?

A
df.dropna(subset=["column"...], inplace = True)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you replace a specific cell value?

A
df.loc[idx, "column"] = "new_value"
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you loop through rows?

A
for x in df.index:
    print(x)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you drop a specific row?

A
df.drop(idx, inplace = True)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How do you view duplicate rows?

A
df.duplicated()
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you drop duplicated rows?

A
df.drop_duplicates(inplace = True)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you drop column(s)?

A
df.drop(columns=["column", ...], inplace=True)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do you convert a column to a different data type?

A
df["col"] = df["col"].astype(type)

Available types: “int”, “float”, “str”, “bool”, “datetime64”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do you sort by column(s)?

A
df.sort_values(by=["col", ...], inplace=True)
24
Q

How do you limit a float column to x decimal places?

A
df["col"] = df["col"].round(x)
25
Q

How do you group a DataFrame by a column and calculate the mean of another column?

A
df.groupby('column_name')['another_column'].mean().reset_index()
26
Q

How do you perform multiple aggregations (mean, sum) on a column after grouping?

A
df.groupby('column_name')['another_column'].agg(['mean', 'sum']).reset_index()
27
Q

How do you specify custom names for the aggregated columns instead of using default names like ‘mean’ or ‘sum’?

A
df.groupby('column_name')['another_column'].agg(
    custom_name_1='mean', 
    custom_name_2='sum'
).reset_index()
28
Q

How do you extract the year from a datetime column?

A
df['year'] = df['datetime_column'].dt.year
29
Q

How do you extract the month from a datetime column?

A
df['month'] = df['datetime_column'].dt.month
30
Q

How do you extract the day of the week from a datetime column (where Monday is 0 and Sunday is 6)?

A
df['weekday'] = df['datetime_column'].dt.weekday
31
Q

How do you extract the day of the month from a datetime column?

A
df['day_of_month'] = df['datetime_column'].dt.day
32
Q

How do you extract the hour from a datetime column?

A
df['hour'] = df['datetime_column'].dt.hour
33
Q

How do you calculate the difference between two datetime columns (in days)?

A
df['date_diff'] = (df['datetime_column2'] - df['datetime_column1']).dt.days
34
Q

How do you filter rows where the datetime column is within a specific date range?

A
df_filtered = df[(df['datetime_column'] >= '2023-01-01') & (df['datetime_column'] <= '2023-12-31')]
35
Q

How do you get the current date and time in pandas?

A
current_datetime = pd.to_datetime('now')
36
Q

How do you add a specific number of days to a datetime column?

A
df['new_datetime'] = df['datetime_column'] + pd.Timedelta(days=7)
37
Q

How do you calculate the difference between two datetime columns in hours?

A
df['time_diff_hours'] = (df['datetime_column2'] - df['datetime_column1']).dt.total_seconds() / 3600
38
Q

How do you extract the quarter from a datetime column?

A
df['quarter'] = df['datetime_column'].dt.quarter
39
Q

How do you convert a datetime column to a string in a specific format?

A
df['datetime_str'] = df['datetime_column'].dt.strftime('%Y-%m-%d %H:%M:%S')
40
Q

How do you extract the week number from a datetime column?

A
df['week_number'] = df['datetime_column'].dt.isocalendar().week
41
Q

How do you filter rows where the datetime column is in the last 30 days?

A
df_filtered = df[df['datetime_column'] >= pd.to_datetime('today') - pd.Timedelta(days=30)]
42
Q

How do you convert a datetime column to a different timezone?

A
df['datetime_column'] = df['datetime_column'].dt.tz_localize('UTC').dt.tz_convert('America/New_York')
43
Q

How do you replace all occurrences of a specific value in a column with another value?

A
df['column_name'] = df['column_name'].replace(old_value, new_value)
44
Q

How do you filter rows based on a condition applied to a column?

A
df_filtered = df[df['column_name'] > threshold]
45
Q

How do you rename columns in a DataFrame?

A
df = df.rename(columns={'old_name': 'new_name', ...})
46
Q

How do you create a new column based on applying a function to another column?

A
df['new_column'] = df['existing_column'].apply(function)
47
Q

How do you sort a DataFrame by one or more columns?

A
df_sorted = df.sort_values(by=['column_name1', 'column_name2'], ascending=[True, False])
48
Q

How do you get the unique values from a column?

A
unique_values = df['column_name'].unique()
49
Q

How do you count the number of occurrences of each unique value in a column?

A
value_counts = df['column_name'].value_counts()
50
Q

How do you calculate the cumulative sum of a column?

A
df['cumulative_sum'] = df['column_name'].cumsum()
51
Q

How do you merge two DataFrames on multiple columns?

A
df_merged = df1.merge(df2, on=['column1', 'column2'], how='inner')
52
Q

How do you concatenate multiple DataFrames vertically (stacking them on top of each other)?

A
df_concat = pd.concat([df1, df2], ignore_index=True)
53
Q

How do you sample a random subset of rows from a DataFrame?

A
df_sample = df.sample(n=100)
54
Q

How do you create a new column with the result of the multiplication between two columns?

A
df['new_column'] = df['column1'] * df['column2']
55
Q

How do you set a specific column as the index of a DataFrame?

A
df.set_index('column_name', inplace=True)
56
Q

How do you create a new DataFrame by filtering rows based on multiple conditions?

A
df_filtered = df[(df['column1'] > threshold1) & (df['column2'] < threshold2)]