Pandas Flashcards

Question 1

Q

Create a DataFrame from a dictionary where keys are column names and values are lists of data.

Answer

A

import pandas as pd

data = {‘name’: [‘Alice’, ‘Bob’, ‘Charlie’],
‘age’: [25, 30, 35]}
df = pd.DataFrame(data)

Question 2

Q

Select a single column named ‘age’ from a DataFrame named ‘df’.

Answer

A

age_column = df[‘age’]

Question 3

Q

Filter rows where the ‘age’ column is greater than 30.

Answer

A

filtered_df = df[df[‘age’] > 30]

Question 4

Q

Add a new column named ‘income’ to the DataFrame ‘df’ with random income values.

Answer

A

import numpy as np

df[‘income’] = np.random.randint(30000, 80000, size=len(df))

Question 5

Q

Group the DataFrame ‘df’ by the ‘gender’ column and calculate the mean age for each group.

Answer

A

grouped_df = df.groupby(‘gender’)[‘age’].mean()

Question 6

Q

Sort the DataFrame ‘df’ by the ‘age’ column in descending order.

Answer

A

sorted_df = df.sort_values(by=’age’, ascending=False)

Question 7

Q

Replace missing values in the ‘income’ column with the mean income.

Answer

A

df[‘income’].fillna(df[‘income’].mean(), inplace=True)

Question 8

Q

Merge two DataFrames ‘df1’ and ‘df2’ based on a common key column.

Answer

A

merged_df = pd.merge(df1, df2, on=’common_key_column’)

Question 9

Q

Drop the column ‘income’ from the DataFrame ‘df’.

Answer

A

df.drop(columns=[‘income’], inplace=True)

Question 10

Q

Rename the column ‘age’ to ‘years’ in the DataFrame ‘df’.

Answer

A

df.rename(columns={‘age’: ‘years’}, inplace=True)

Question 11

Q

Calculate the sum of the ‘income’ column in the DataFrame ‘df’.

Answer

A

total_income = df[‘income’].sum()

Question 12

Q

Apply a function that converts all ‘age’ values to months in the DataFrame ‘df’.

Answer

A

df[‘age’] = df[‘age’].apply(lambda x: x * 12)

Question 13

Q

Pivot the DataFrame ‘df’ with ‘gender’ as index and ‘age’ as columns, filling NaNs with 0.

Answer

A

pivoted_df = df.pivot_table(index=’gender’, columns=’age’, fill_value=0)

Question 14

Q

Convert a column named ‘date’ to datetime format in the DataFrame ‘df’.

Answer

A

df[‘date’] = pd.to_datetime(df[‘date’])

Question 15

Q

Select rows where the ‘gender’ column is ‘female’.

Answer

A

female_rows = df[df[‘gender’] == ‘female’]

Question 16

Q

Create dummy variables for the ‘gender’ column in the DataFrame ‘df’.

Answer

A

dummy_df = pd.get_dummies(df, columns=[‘gender’])

Question 17

Q

Concatenate two DataFrames ‘df1’ and ‘df2’ vertically.

Answer

A

concatenated_df = pd.concat([df1, df2])

Question 18

Q

Randomly sample 10 rows from the DataFrame ‘df’.

Answer

A

sampled_df = df.sample(n=10)

Question 19

Q

Replace all occurrences of ‘Male’ with ‘M’ and ‘Female’ with ‘F’ in the ‘gender’ column.

Answer

A

df[‘gender’].replace({‘Male’: ‘M’, ‘Female’: ‘F’}, inplace=True)

Question 20

Q

Export the DataFrame ‘df’ to a CSV file named ‘data.csv’.

Answer

A

df.to_csv(‘data.csv’, index=False)

Question 21

Q

Find the unique values in the ‘category’ column of the DataFrame ‘df’.

Answer

A

unique_categories = df[‘category’].unique()

Question 22

Q

Calculate the mean, median, and standard deviation of the ‘height’ column in the DataFrame ‘df’.

Answer

A

mean_height = df[‘height’].mean()
median_height = df[‘height’].median()
std_dev_height = df[‘height’].std()

Question 23

Q

Apply a custom function to calculate the 75th percentile of the ‘income’ column in the DataFrame ‘df’.

Answer

A

percentile_75 = df[‘income’].quantile(0.75)

Question 24

Q

Extract the first word from each entry in the ‘name’ column of the DataFrame ‘df’.

Answer

A

first_word = df[‘name’].str.split().str.get(0)

Question 25

Q

Convert the ‘grade’ column in the DataFrame ‘df’ to categorical type.

Answer

A

df[‘grade’] = df[‘grade’].astype(‘category’)

Question 26

Q

Remove duplicate rows from the DataFrame ‘df’ based on all columns.

Answer

A

df.drop_duplicates(inplace=True)

Question 27

Q

Fill missing values in the ‘weight’ column with the median weight in the DataFrame ‘df’.

Answer

A

median_weight = df[‘weight’].median()
df[‘weight’].fillna(median_weight, inplace=True)

Question 28

Q

Merge two DataFrames ‘df1’ and ‘df2’ on ‘key_column’ from ‘df1’ and ‘other_key_column’ from ‘df2’.

Answer

A

merged_df = pd.merge(df1, df2, left_on=’key_column’, right_on=’other_key_column’)

Question 29

Q

Calculate the cumulative sum of the ‘sales’ column in the DataFrame ‘df’.

Answer

A

cumulative_sales = df[‘sales’].cumsum()

Question 30

Q

Convert the timezone of the ‘timestamp’ column to ‘UTC’ in the DataFrame ‘df’.

Answer

A

df[‘timestamp’] = df[‘timestamp’].dt.tz_convert(‘UTC’)

Question 31

Q

Find the maximum and minimum values of the ‘temperature’ column in the DataFrame ‘df’.

Answer

A

max_temp = df[‘temperature’].max()
min_temp = df[‘temperature’].min()

Question 32

Q

Calculate the percentage change of the ‘price’ column in the DataFrame ‘df’.

Answer

A

percentage_change_price = df[‘price’].pct_change() * 100

Question 33

Q

Count the number of null values in each column of the DataFrame ‘df’.

Answer

A

null_counts = df.isnull().sum()

Question 34

Q

Create a MultiIndex DataFrame from ‘df’ with ‘group’ and ‘subgroup’ as index levels.

Answer

A

df.set_index([‘group’, ‘subgroup’], inplace=True)

Question 35

Q

Remove outliers from the ‘score’ column in the DataFrame ‘df’ using z-score method.

Answer

A

z_scores = (df[‘score’] - df[‘score’].mean()) / df[‘score’].std()
df = df[(z_scores < 3) & (z_scores > -3)]

Question 36

Q

Check if there are any missing values in the DataFrame ‘df’.

Answer

A

has_missing_values = df.isnull().values.any()

Question 37

Q

Convert all text in the ‘description’ column to lowercase in the DataFrame ‘df’.

Answer

A

df[‘description’] = df[‘description’].str.lower()

Question 38

Q

Fill missing values in the DataFrame ‘df’ using forward fill method.

Answer

A

df.fillna(method=’ffill’, inplace=True)

Question 39

Q

Calculate the correlation matrix for all numeric columns in the DataFrame ‘df’.

Answer

A

correlation_matrix = df.corr()

Question 40

Q

Multiply each value in the ‘quantity’ column by 2 in the DataFrame ‘df’.

Answer

A

df[‘quantity’] *= 2

Question 41

Q

Create a new column in the DataFrame ‘df’ containing lagged values of the ‘price’ column.

Answer

A

df[‘lagged_price’] = df[‘price’].shift(1)

Question 42

Q

Identify peaks in the ‘temperature’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

Answer

A

df[‘is_peak’] = (df[‘temperature’] > df[‘temperature’].shift(1)) & (df[‘temperature’] > df[‘temperature’].shift(-1))

Question 43

Q

Calculate the rate of change between consecutive rows for the ‘sales’ column in the DataFrame ‘df’.

Answer

A

df[‘rate_of_change’] = df[‘sales’].diff()

Question 44

Q

Determine if there is an upward trend in the ‘price’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

Answer

A

df[‘is_upward_trend’] = (df[‘price’] > df[‘price’].shift(1)) & (df[‘price’] > df[‘price’].shift(-1))

Question 45

Q

Calculate the difference between each value in the ‘sales’ column and the value in the previous row in the DataFrame ‘df’.

Answer

A

df[‘sales_difference’] = df[‘sales’] - df[‘sales’].shift(1)

Question 46

Q

Identify turning points in the ‘temperature’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

Answer

A

df[‘is_turning_point’] = ((df[‘temperature’] > df[‘temperature’].shift(1)) & (df[‘temperature’] > df[‘temperature’].shift(-1))) | ((df[‘temperature’] < df[‘temperature’].shift(1)) & (df[‘temperature’] < df[‘temperature’].shift(-1)))

Question 47

Q

Calculate the percentage change between each value in the ‘revenue’ column and the value in the previous row in the DataFrame ‘df’.

Answer

A

df[‘percentage_change’] = ((df[‘revenue’] - df[‘revenue’].shift(1)) / df[‘revenue’].shift(1)) * 100

Question 48

Q

Detect changes in sign (positive to negative or vice versa) in the ‘value’ column of the DataFrame ‘df’ by comparing values to the previous row.

Answer

A

df[‘sign_change’] = (df[‘value’] * df[‘value’].shift(1)) < 0

Question 49

Q

Identify local extrema (maxima and minima) in the ‘price’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

Answer

A

df[‘is_local_extrema’] = ((df[‘price’] > df[‘price’].shift(1)) & (df[‘price’] > df[‘price’].shift(-1))) | ((df[‘price’] < df[‘price’].shift(1)) & (df[‘price’] < df[‘price’].shift(-1)))

Question 50

Q

Calculate the moving range between consecutive rows for the ‘temperature’ column in the DataFrame ‘df’.

Answer

A

df[‘moving_range’] = df[‘temperature’].diff()

Question 51

Q

Identify peaks and troughs in the ‘value’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

Answer

A

df[‘is_peak’] = (df[‘value’] > df[‘value’].shift(1)) & (df[‘value’] > df[‘value’].shift(-1))
df[‘is_trough’] = (df[‘value’] < df[‘value’].shift(1)) & (df[‘value’] < df[‘value’].shift(-1))

Question 52

Q

Detect significant changes in magnitude (increase or decrease) in the ‘sales’ column of the DataFrame ‘df’ by comparing values to the previous row.

Answer

A

df[‘magnitude_change’] = abs(df[‘sales’] - df[‘sales’].shift(1))

Question 53

Q

Calculate the relative change between each value in the ‘volume’ column and the value in the previous row in the DataFrame ‘df’.

Answer

A

df[‘relative_change’] = (df[‘volume’] - df[‘volume’].shift(1)) / df[‘volume’].shift(1)

Question 54

Q

Identify shifts or jumps in the ‘price’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

Answer

A

df[‘shifted_difference’] = df[‘price’] - df[‘price’].shift(1)

Question 55

Q

Calculate the difference between the 7-day moving average and the actual value for the ‘temperature’ column in the DataFrame ‘df’.

Answer

A

df[‘moving_average_difference’] = df[‘temperature’].rolling(window=7).mean() - df[‘temperature’]

Question 56

Q

Identify consecutive increases or decreases in the ‘sales’ column of the DataFrame ‘df’ by comparing values to the previous row.

Answer

A

df[‘consecutive_increase’] = (df[‘sales’] > df[‘sales’].shift(1)) & (df[‘sales’].shift(1) > df[‘sales’].shift(2))
df[‘consecutive_decrease’] = (df[‘sales’] < df[‘sales’].shift(1)) & (df[‘sales’].shift(1) < df[‘sales’].shift(2))

Question 57

Q

Detect fluctuations or oscillations in the ‘value’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

Answer

A

df[‘is_fluctuation’] = ((df[‘value’] > df[‘value’].shift(1)) & (df[‘value’] > df[‘value’].shift(-1))) | ((df[‘value’] < df[‘value’].shift(1)) & (df[‘value’] < df[‘value’].shift(-1)))

Question 58

Q

Calculate the difference between the z-score of each value and the z-score of the previous value for the ‘price’ column in the DataFrame ‘df’.

Answer

A

from scipy.stats import zscore
df[‘z_score’] = zscore(df[‘price’])
df[‘z_score_difference’] = df[‘z_score’] - df[‘z_score’].shift(1)

Question 59

Q

Identify sudden changes or spikes in the ‘volume’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

Answer

A

df[‘is_spike’] = ((df[‘volume’] - df[‘volume’].shift(1)).abs() > threshold)

Question 60

Q

Detect a reversal in trends (e.g., from increasing to decreasing) in the ‘sales’ column of the DataFrame ‘df’ by comparing values to the previous and next rows.

Answer

A

df[‘is_trend_reversal’] = ((df[‘sales’] > df[‘sales’].shift(1)) & (df[‘sales’] > df[‘sales’].shift(-1))) | ((df[‘sales’] < df[‘sales’].shift(1)) & (df[‘sales’] < df[‘sales’].shift(-1)))

Question 61

Q

Resample the time series data in the DataFrame ‘df’ to a monthly frequency.

Answer

A

df_resampled = df.resample(‘M’).mean()
print(df_resampled)

Question 62

Q

Extract the year and month from the ‘date’ column in the DataFrame ‘df’.

Answer

A

df[‘year’] = df[‘date’].dt.year
df[‘month’] = df[‘date’].dt.month
print(df[[‘year’, ‘month’]])

Question 63

Q

Calculate the 7-day moving average of the ‘temperature’ column in the DataFrame ‘df’.

Answer

A

df[‘moving_average’] = df[‘temperature’].rolling(window=7).mean()
print(df[‘moving_average’])

Question 64

Q

Fill missing values in the DataFrame ‘df’ using backward fill method.

Answer

A

df_filled = df.fillna(method=’bfill’)
print(df_filled)

Answer 65

A

Q1 = df[‘value’].quantile(0.25)
Q3 = df[‘value’].quantile(0.75)
IQR = Q3 - Q1
outliers = df[(df[‘value’] < Q1 - 1.5 * IQR) | (df[‘value’] > Q3 + 1.5 * IQR)]
print(outliers)

Answer 66

A

concatenated_df = pd.concat([df1, df2], axis=1)
print(concatenated_df)

Answer 67

A

df[‘duration’] = df[‘end_time’] - df[‘start_time’]
print(df[‘duration’])

Answer 68

A

weighted_avg = df.groupby(‘category’)[‘value’].apply(lambda x: np.average(x, weights=df.loc[x.index, ‘weights’]))
print(weighted_avg)

Answer 69

A

selected_rows = df.loc[‘2023-01-01’:’2023-01-07’]
print(selected_rows)

Answer 70

A

df.columns = [‘_‘.join(col) for col in df.columns]
print(df)

Answer 71

A

df[‘price’] = df[‘price’].astype(float)
print(df[‘price’])

Answer 72

A

rolling_sum = df[‘sales’].rolling(window=3).sum()
print(rolling_sum)

Answer 73

A

df[‘timestamp’] = df[‘timestamp’].dt.tz_localize(‘EST’).dt.tz_convert(‘UTC’)
print(df[‘timestamp’])

Answer 74

A

encoded_df = pd.get_dummies(df[‘category’])
print(encoded_df)

Answer 75

A

most_common_value = df[‘category’].mode()[0]
print(most_common_value)

Answer 76

A

df = df.swaplevel(axis=1)
print(df)

Answer 77

A

df[‘EMA’] = df[‘value’].ewm(span=10).mean()
print(df[‘EMA’])

Brainscape's Knowledge GenomeTM

Pandas Flashcards

Brainscape's Knowledge Genome^TM