Pandas Flashcards

1
Q

Create a DataFrame from a dictionary where keys are column names and values are lists of data.

A

import pandas as pd

data = {‘name’: [‘Alice’, ‘Bob’, ‘Charlie’],
‘age’: [25, 30, 35]}
df = pd.DataFrame(data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Select a single column named ‘age’ from a DataFrame named ‘df’.

A

age_column = df[‘age’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Filter rows where the ‘age’ column is greater than 30.

A

filtered_df = df[df[‘age’] > 30]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Add a new column named ‘income’ to the DataFrame ‘df’ with random income values.

A

import numpy as np

df[‘income’] = np.random.randint(30000, 80000, size=len(df))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Group the DataFrame ‘df’ by the ‘gender’ column and calculate the mean age for each group.

A

grouped_df = df.groupby(‘gender’)[‘age’].mean()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sort the DataFrame ‘df’ by the ‘age’ column in descending order.

A

sorted_df = df.sort_values(by=’age’, ascending=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Replace missing values in the ‘income’ column with the mean income.

A

df[‘income’].fillna(df[‘income’].mean(), inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Merge two DataFrames ‘df1’ and ‘df2’ based on a common key column.

A

merged_df = pd.merge(df1, df2, on=’common_key_column’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Drop the column ‘income’ from the DataFrame ‘df’.

A

df.drop(columns=[‘income’], inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Rename the column ‘age’ to ‘years’ in the DataFrame ‘df’.

A

df.rename(columns={‘age’: ‘years’}, inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Calculate the sum of the ‘income’ column in the DataFrame ‘df’.

A

total_income = df[‘income’].sum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Apply a function that converts all ‘age’ values to months in the DataFrame ‘df’.

A

df[‘age’] = df[‘age’].apply(lambda x: x * 12)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Pivot the DataFrame ‘df’ with ‘gender’ as index and ‘age’ as columns, filling NaNs with 0.

A

pivoted_df = df.pivot_table(index=’gender’, columns=’age’, fill_value=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Convert a column named ‘date’ to datetime format in the DataFrame ‘df’.

A

df[‘date’] = pd.to_datetime(df[‘date’])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Select rows where the ‘gender’ column is ‘female’.

A

female_rows = df[df[‘gender’] == ‘female’]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Create dummy variables for the ‘gender’ column in the DataFrame ‘df’.

A

dummy_df = pd.get_dummies(df, columns=[‘gender’])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Concatenate two DataFrames ‘df1’ and ‘df2’ vertically.

A

concatenated_df = pd.concat([df1, df2])

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Randomly sample 10 rows from the DataFrame ‘df’.

A

sampled_df = df.sample(n=10)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Replace all occurrences of ‘Male’ with ‘M’ and ‘Female’ with ‘F’ in the ‘gender’ column.

A

df[‘gender’].replace({‘Male’: ‘M’, ‘Female’: ‘F’}, inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Export the DataFrame ‘df’ to a CSV file named ‘data.csv’.

A

df.to_csv(‘data.csv’, index=False)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Find the unique values in the ‘category’ column of the DataFrame ‘df’.

A

unique_categories = df[‘category’].unique()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Calculate the mean, median, and standard deviation of the ‘height’ column in the DataFrame ‘df’.

A

mean_height = df[‘height’].mean()
median_height = df[‘height’].median()
std_dev_height = df[‘height’].std()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Apply a custom function to calculate the 75th percentile of the ‘income’ column in the DataFrame ‘df’.

A

percentile_75 = df[‘income’].quantile(0.75)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Extract the first word from each entry in the ‘name’ column of the DataFrame ‘df’.

A

first_word = df[‘name’].str.split().str.get(0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Convert the ‘grade’ column in the DataFrame ‘df’ to categorical type.

A

df[‘grade’] = df[‘grade’].astype(‘category’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Remove duplicate rows from the DataFrame ‘df’ based on all columns.

A

df.drop_duplicates(inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Fill missing values in the ‘weight’ column with the median weight in the DataFrame ‘df’.

A

median_weight = df[‘weight’].median()
df[‘weight’].fillna(median_weight, inplace=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Merge two DataFrames ‘df1’ and ‘df2’ on ‘key_column’ from ‘df1’ and ‘other_key_column’ from ‘df2’.

A

merged_df = pd.merge(df1, df2, left_on=’key_column’, right_on=’other_key_column’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Calculate the cumulative sum of the ‘sales’ column in the DataFrame ‘df’.

A

cumulative_sales = df[‘sales’].cumsum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Convert the timezone of the ‘timestamp’ column to ‘UTC’ in the DataFrame ‘df’.

A

df[‘timestamp’] = df[‘timestamp’].dt.tz_convert(‘UTC’)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Find the maximum and minimum values of the ‘temperature’ column in the DataFrame ‘df’.

A

max_temp = df[‘temperature’].max()
min_temp = df[‘temperature’].min()

32
Q

Calculate the percentage change of the ‘price’ column in the DataFrame ‘df’.

A

percentage_change_price = df[‘price’].pct_change() * 100

33
Q

Count the number of null values in each column of the DataFrame ‘df’.

A

null_counts = df.isnull().sum()

34
Q

Create a MultiIndex DataFrame from ‘df’ with ‘group’ and ‘subgroup’ as index levels.

A

df.set_index([‘group’, ‘subgroup’], inplace=True)

35
Q

Remove outliers from the ‘score’ column in the DataFrame ‘df’ using z-score method.

A

z_scores = (df[‘score’] - df[‘score’].mean()) / df[‘score’].std()
df = df[(z_scores < 3) & (z_scores > -3)]

36
Q

Check if there are any missing values in the DataFrame ‘df’.

A

has_missing_values = df.isnull().values.any()

37
Q

Convert all text in the ‘description’ column to lowercase in the DataFrame ‘df’.

A

df[‘description’] = df[‘description’].str.lower()

38
Q

Fill missing values in the DataFrame ‘df’ using forward fill method.

A

df.fillna(method=’ffill’, inplace=True)

39
Q

Calculate the correlation matrix for all numeric columns in the DataFrame ‘df’.

A

correlation_matrix = df.corr()

40
Q

Multiply each value in the ‘quantity’ column by 2 in the DataFrame ‘df’.

A

df[‘quantity’] *= 2

41
Q

Create a new column in the DataFrame ‘df’ containing lagged values of the ‘price’ column.

A

df[‘lagged_price’] = df[‘price’].shift(1)

42
Q

Identify peaks in the ‘temperature’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

A

df[‘is_peak’] = (df[‘temperature’] > df[‘temperature’].shift(1)) & (df[‘temperature’] > df[‘temperature’].shift(-1))

43
Q

Calculate the rate of change between consecutive rows for the ‘sales’ column in the DataFrame ‘df’.

A

df[‘rate_of_change’] = df[‘sales’].diff()

44
Q

Determine if there is an upward trend in the ‘price’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

A

df[‘is_upward_trend’] = (df[‘price’] > df[‘price’].shift(1)) & (df[‘price’] > df[‘price’].shift(-1))

45
Q

Calculate the difference between each value in the ‘sales’ column and the value in the previous row in the DataFrame ‘df’.

A

df[‘sales_difference’] = df[‘sales’] - df[‘sales’].shift(1)

46
Q

Identify turning points in the ‘temperature’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

A

df[‘is_turning_point’] = ((df[‘temperature’] > df[‘temperature’].shift(1)) & (df[‘temperature’] > df[‘temperature’].shift(-1))) | ((df[‘temperature’] < df[‘temperature’].shift(1)) & (df[‘temperature’] < df[‘temperature’].shift(-1)))

47
Q

Calculate the percentage change between each value in the ‘revenue’ column and the value in the previous row in the DataFrame ‘df’.

A

df[‘percentage_change’] = ((df[‘revenue’] - df[‘revenue’].shift(1)) / df[‘revenue’].shift(1)) * 100

48
Q

Detect changes in sign (positive to negative or vice versa) in the ‘value’ column of the DataFrame ‘df’ by comparing values to the previous row.

A

df[‘sign_change’] = (df[‘value’] * df[‘value’].shift(1)) < 0

49
Q

Identify local extrema (maxima and minima) in the ‘price’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

A

df[‘is_local_extrema’] = ((df[‘price’] > df[‘price’].shift(1)) & (df[‘price’] > df[‘price’].shift(-1))) | ((df[‘price’] < df[‘price’].shift(1)) & (df[‘price’] < df[‘price’].shift(-1)))

50
Q

Calculate the moving range between consecutive rows for the ‘temperature’ column in the DataFrame ‘df’.

A

df[‘moving_range’] = df[‘temperature’].diff()

51
Q

Identify peaks and troughs in the ‘value’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

A

df[‘is_peak’] = (df[‘value’] > df[‘value’].shift(1)) & (df[‘value’] > df[‘value’].shift(-1))
df[‘is_trough’] = (df[‘value’] < df[‘value’].shift(1)) & (df[‘value’] < df[‘value’].shift(-1))

52
Q

Detect significant changes in magnitude (increase or decrease) in the ‘sales’ column of the DataFrame ‘df’ by comparing values to the previous row.

A

df[‘magnitude_change’] = abs(df[‘sales’] - df[‘sales’].shift(1))

53
Q

Calculate the relative change between each value in the ‘volume’ column and the value in the previous row in the DataFrame ‘df’.

A

df[‘relative_change’] = (df[‘volume’] - df[‘volume’].shift(1)) / df[‘volume’].shift(1)

54
Q

Identify shifts or jumps in the ‘price’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

A

df[‘shifted_difference’] = df[‘price’] - df[‘price’].shift(1)

55
Q

Calculate the difference between the 7-day moving average and the actual value for the ‘temperature’ column in the DataFrame ‘df’.

A

df[‘moving_average_difference’] = df[‘temperature’].rolling(window=7).mean() - df[‘temperature’]

56
Q

Identify consecutive increases or decreases in the ‘sales’ column of the DataFrame ‘df’ by comparing values to the previous row.

A

df[‘consecutive_increase’] = (df[‘sales’] > df[‘sales’].shift(1)) & (df[‘sales’].shift(1) > df[‘sales’].shift(2))
df[‘consecutive_decrease’] = (df[‘sales’] < df[‘sales’].shift(1)) & (df[‘sales’].shift(1) < df[‘sales’].shift(2))

57
Q

Detect fluctuations or oscillations in the ‘value’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

A

df[‘is_fluctuation’] = ((df[‘value’] > df[‘value’].shift(1)) & (df[‘value’] > df[‘value’].shift(-1))) | ((df[‘value’] < df[‘value’].shift(1)) & (df[‘value’] < df[‘value’].shift(-1)))

58
Q

Calculate the difference between the z-score of each value and the z-score of the previous value for the ‘price’ column in the DataFrame ‘df’.

A

from scipy.stats import zscore
df[‘z_score’] = zscore(df[‘price’])
df[‘z_score_difference’] = df[‘z_score’] - df[‘z_score’].shift(1)

59
Q

Identify sudden changes or spikes in the ‘volume’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

A

df[‘is_spike’] = ((df[‘volume’] - df[‘volume’].shift(1)).abs() > threshold)

60
Q

Detect a reversal in trends (e.g., from increasing to decreasing) in the ‘sales’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

A

df[‘is_trend_reversal’] = ((df[‘sales’] > df[‘sales’].shift(1)) & (df[‘sales’] > df[‘sales’].shift(-1))) | ((df[‘sales’] < df[‘sales’].shift(1)) & (df[‘sales’] < df[‘sales’].shift(-1)))

61
Q

Resample the time series data in the DataFrame ‘df’ to a monthly frequency.

A

df_resampled = df.resample(‘M’).mean()
print(df_resampled)

62
Q

Extract the year and month from the ‘date’ column in the DataFrame ‘df’.

A

df[‘year’] = df[‘date’].dt.year
df[‘month’] = df[‘date’].dt.month
print(df[[‘year’, ‘month’]])

63
Q

Calculate the 7-day moving average of the ‘temperature’ column in the DataFrame ‘df’.

A

df[‘moving_average’] = df[‘temperature’].rolling(window=7).mean()
print(df[‘moving_average’])

64
Q

Fill missing values in the DataFrame ‘df’ using backward fill method.

A

df_filled = df.fillna(method=’bfill’)
print(df_filled)

65
Q

Detect outliers in the ‘value’ column of the DataFrame ‘df’ using IQR method.

A

Q1 = df[‘value’].quantile(0.25)
Q3 = df[‘value’].quantile(0.75)
IQR = Q3 - Q1
outliers = df[(df[‘value’] < Q1 - 1.5 * IQR) | (df[‘value’] > Q3 + 1.5 * IQR)]
print(outliers)

66
Q

Concatenate DataFrames ‘df1’ and ‘df2’ horizontally along columns.

A

concatenated_df = pd.concat([df1, df2], axis=1)
print(concatenated_df)

67
Q

Calculate the duration between two datetime columns ‘start_time’ and ‘end_time’ in theCalculate the duration between two datetime columns ‘start_time’ and ‘end_time’ in the DataFrame ‘df’.

A

df[‘duration’] = df[‘end_time’] - df[‘start_time’]
print(df[‘duration’])

68
Q

Apply a custom aggregation function to find the weighted average of the ‘value’ column grouped by ‘category’ in the DataFrame ‘df’.

A

weighted_avg = df.groupby(‘category’)[‘value’].apply(lambda x: np.average(x, weights=df.loc[x.index, ‘weights’]))
print(weighted_avg)

69
Q

Select rows in the DataFrame ‘df’ where the index is between ‘2023-01-01’ and ‘2023-01-07’.

A

selected_rows = df.loc[‘2023-01-01’:’2023-01-07’]
print(selected_rows)

70
Q

Flatten multi-level column headers in the DataFrame ‘df’ to single level.

A

df.columns = [‘_‘.join(col) for col in df.columns]
print(df)

71
Q

Convert the data type of the ‘price’ column to float in the DataFrame ‘df’.

A

df[‘price’] = df[‘price’].astype(float)
print(df[‘price’])

72
Q

Calculate the rolling sum of the last 3 days for the ‘sales’ column in the DataFrame ‘df’.

A

rolling_sum = df[‘sales’].rolling(window=3).sum()
print(rolling_sum)

73
Q

Convert the time zone of the ‘timestamp’ column from ‘EST’ to ‘UTC’ in the DataFrame ‘df’.

A

df[‘timestamp’] = df[‘timestamp’].dt.tz_localize(‘EST’).dt.tz_convert(‘UTC’)
print(df[‘timestamp’])

74
Q

Encode the ‘category’ column of the DataFrame ‘df’ using one-hot encoding.

A

encoded_df = pd.get_dummies(df[‘category’])
print(encoded_df)

75
Q

Find the most common value in the ‘category’ column of the DataFrame ‘df’.

A

most_common_value = df[‘category’].mode()[0]
print(most_common_value)

76
Q

Swap the levels of the hierarchical index in the DataFrame ‘df’.

A

df = df.swaplevel(axis=1)
print(df)

77
Q

Calculate the exponential moving average with a span of 10 for the ‘value’ column in the DataFrame ‘df’.

A

df[‘EMA’] = df[‘value’].ewm(span=10).mean()
print(df[‘EMA’])