Working with DataFrames Flashcards
How do you apply a function to a specific column in a DataFrame?
Use df[‘column’].apply(func).
df[‘A_squared’] = df[‘A’].apply(lambda x: x**2)
print(df)
How do you apply a function across multiple columns?
Use .apply() with axis=1.
df[‘Sum’] = df.apply(lambda row: row[‘A’] + row[‘B’], axis=1)
print(df)
How do you use .map() to transform a column using a dictionary?
Use df[‘column’].map(mapping_dict).
mapping = {1: ‘One’, 2: ‘Two’}
df[‘Mapped_A’] = df[‘A’].map(mapping)
print(df)
How do you replace values in a column using .replace()?
Use df[‘column’].replace({old_value: new_value}).
df[‘A’] = df[‘A’].replace({1: 100, 2: 200})
print(df)
How do you use .transform() to apply a function to a column while maintaining its shape?
Use df[‘column’].transform(func).
df[‘Normalized_A’] = df[‘A’].transform(lambda x: (x - x.mean()) / x.std())
print(df)
How do you calculate row-wise means for selected columns?
Use .mean(axis=1).
df[‘Row_Mean’] = df[[‘A’, ‘B’]].mean(axis=1)
print(df)
How do you filter rows based on conditions across multiple columns?
Use boolean indexing with conditions.
filtered_df = df[(df[‘A’] > 1) & (df[‘B’] < 5)]
print(filtered_df)
How do you compute the rank of values in a column?
Use df[‘column’].rank().
df[‘Rank’] = df[‘A’].rank(ascending=False)
print(df)
How do you rename index labels in a DataFrame?
Use df.rename(index={old_label: new_label}).
df.rename(index={0: ‘Row_0’, 1: ‘Row_1’}, inplace=True)
print(df)
How do you reset the index of a DataFrame?
Use df.reset_index().
df.reset_index(drop=True, inplace=True)
print(df)
How do you set a column as the index of a DataFrame?
Use df.set_index(‘column_name’).
df.set_index(‘A’, inplace=True)
print(df)
How do you filter rows where a column’s value is in a list?
Use .isin().
filtered_df = df[df[‘A’].isin([1, 2])]
print(filtered_df)
How do you drop rows based on a condition?
Use df[~condition] or df.drop().
df = df[~(df[‘A’] > 2)]
print(df)
How do you find the maximum value in a column and its corresponding row?
Use df[‘column’].max() and .idxmax().
max_val = df[‘A’].max()
max_row = df.loc[df[‘A’].idxmax()]
print(max_val, max_row)
How do you append a new row to a DataFrame?
Use df.append(row, ignore_index=True).
new_row = {‘A’: 10, ‘B’: 20}
df = df.append(new_row, ignore_index=True)
print(df)