pandas2 Flashcards

Question 1

Q

basic info. Give two methods. One of them also list categorical data/unique values/top counts

Answer

A

df.describe(include = ‘all’) and df.info()

Question 2

Q

sort by values first in col1 then in col2, col1 ascending and col2 descending.

Answer

A

df = df.sort_values(by = [‘col1’, ‘col2’], ascending = [1, 0]).reset_index(drop = True)

Question 3

Q

create dummy variable manually

Answer

A

df[‘col’] = df[‘col’].map({‘value1’: 1, ‘value2’: 0})

Question 4

Q

x is a matrix of a list. you want to use elements in the list

Answer

A

x.columns.values

Question 5

Q

delete columns

Answer

A

df.drop(columns = [‘col1’, ‘col2’], axis = 1)

Question 6

Q

the value of the 99th percentile (to use later to remove outliers)

Answer

A

threshold = df[‘col’].quantile(0.99)

Question 7

Q

create dummy variables for all

Answer

A

new_df = pd.get_dummies(df, drop_first=True)

Question 8

Q

rearrange columns of a df, and what order should they be?

Answer

A

df.columns.values
## copy output array
cols = [PASTE HERE] # and rearrange
new_df = df[col]

dependent variable, numeric independent variable, dummies

Question 9

Q

create a df after linear regression to check test results

Answer

A

y_test = y_test.reset_index(drop=True)
df = pd.DataFrame(np.exp(y_hat_test), columns=['Prediction'])
df['Target'] = np.exp(y_test)
df['Residual'] = df['Target'] - df['Prediction']
df_pf['Difference_perc'] = np.absolute(df_pf['Residual']/df_pf['Target']*100)

Question 10

Q

set number of rows to display to n

Answer

A

pd.options.display.max_rows = n

Question 11

Q

display floats to 2 digits

Answer

A

pd.set_option(‘display.float_format’, lambda x: ‘%.2f’ % x)

Question 12

Q

check for nulls

Answer

A

df.isnull().sum()

Question 13

Q

turn confusion matrix from statsmodels to pd df. Assume model is results_log

Answer

A

cm_df = pd.DataFrame(results_log.pred_table())
cm_df.columns = ['Predicted 0','Predicted 1']
cm_df = cm_df.rename(index={0: 'Actual 0',1:'Actual 1'})

Question 14

Q

select all rows and first and second columns

Answer

A

df.iloc[:,1:3]

pandas2 Flashcards

(14 cards)