Supervised Learning Flashcards
What is Reinforcement Learning?
Software agent optimizes its behaviour based on rewards and punishments.
How to do VEDA for 2 categorical variables?
Using sns.barplot(x=”day”, y=”total_bill”, data=tips) where data is a df.
OR tips.boxplot(‘day’, ‘total_bill’)
How to do VEDA for binary categorical variables?
plt. figure()
sns. countplot(x=’education’, hue=’party’, data=df, palette=’RdBu’)
plt. xticks([0,1], [‘No’, ‘Yes’])
plt. show()
How to do pair-wise VEDA for 4 quantitative variables?
pd.scatter_matrix(df, c = y, figsize = [8, 8], marker = ‘D’)
What is accuracy?
Fraction of correct predictions.
How to access predictor values after removing target values from df?
df.drop(‘target’, axis=1).values
How to turn list of values into format for sklearn?
X.reshape(-1,1)
How to generate pairwise feature correlation VEDA?
sns.heatmap(df.corr(), square=True, cmap=’RdYlGn’)
What are a and b in y = ax +b ?
a is slope and b is y intercept
How to do k-fold cv with sklearn?
cross_val_score(reg, X, y, cv=k)
Why should regularization be used?
To penalize large coefficients and avoid over-fitting
What is Ridge regression?
regression with regularization where alpha (hyper)-parameter weighs the OLS. Should be first choice for regression over lasso.
What is Lasso regression?
regression with regularization where coefs can be set to 0 to remove unimportant features. Great for feature selection.
How to specify parameters for Lasso and access its coefficients?
lasso = Lasso(alpha=0.4, normalize=True)
lasso.coef_
When is accuracy a poor metric when only fraction of correct predictions is used?
When there is class imbalance where low freq items will never be correctly labeled. if maj class is 99%, then accuracy of 99% can be achieved with model that always picks maj class.