pandas2 Flashcards

1
Q

basic info. Give two methods. One of them also list categorical data/unique values/top counts

A

df.describe(include = ‘all’) and df.info()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

sort by values first in col1 then in col2, col1 ascending and col2 descending.

A

df = df.sort_values(by = [‘col1’, ‘col2’], ascending = [1, 0]).reset_index(drop = True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

create dummy variable manually

A

df[‘col’] = df[‘col’].map({‘value1’: 1, ‘value2’: 0})

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

x is a matrix of a list. you want to use elements in the list

A

x.columns.values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

delete columns

A

df.drop(columns = [‘col1’, ‘col2’], axis = 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

the value of the 99th percentile (to use later to remove outliers)

A

threshold = df[‘col’].quantile(0.99)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

create dummy variables for all

A

new_df = pd.get_dummies(df, drop_first=True)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

rearrange columns of a df, and what order should they be?

A
df.columns.values
## copy output array
cols = [PASTE HERE] # and rearrange
new_df = df[col]

dependent variable, numeric independent variable, dummies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

create a df after linear regression to check test results

A
y_test = y_test.reset_index(drop=True)
df = pd.DataFrame(np.exp(y_hat_test), columns=['Prediction'])
df['Target'] = np.exp(y_test)
df['Residual'] = df['Target'] - df['Prediction']
df_pf['Difference_perc'] = np.absolute(df_pf['Residual']/df_pf['Target']*100)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

set number of rows to display to n

A

pd.options.display.max_rows = n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

display floats to 2 digits

A

pd.set_option(‘display.float_format’, lambda x: ‘%.2f’ % x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

check for nulls

A

df.isnull().sum()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

turn confusion matrix from statsmodels to pd df. Assume model is results_log

A
cm_df = pd.DataFrame(results_log.pred_table())
cm_df.columns = ['Predicted 0','Predicted 1']
cm_df = cm_df.rename(index={0: 'Actual 0',1:'Actual 1'})
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

select all rows and first and second columns

A

df.iloc[:,1:3]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly