Midterm Flashcards by Jason Chen

when to use ML?

to hard to hardcode
automation
problem is changing frequently

How well did you know this?

Not at all

Perfectly

when to not use ML?

algorithm already exists
not enough data
ethical concerns
requires explanations, not just predictions

How well did you know this?

Not at all

Perfectly

what is the free lunch theorem?

no model works best for every dataset

How well did you know this?

Not at all

Perfectly

what does regularization do?

lowers variance without raising bias much

How well did you know this?

Not at all

Perfectly

how to pick hyperparameters?

grid search, random select

How well did you know this?

Not at all

Perfectly

what is softmax regression?

train multiple logistic regressions
make sure probabilities all sum to 1
higher temperature means less confident

How well did you know this?

Not at all

Perfectly

what is logistic regression?

turns linear regression into classification
outputs a probability

How well did you know this?

Not at all

Perfectly

hard margin SVM

trying to perfectly separate the two classes

How well did you know this?

Not at all

Perfectly

soft margin SVM

allow some points to be misclassified, as long as the mistakes are not too large

How well did you know this?

Not at all

Perfectly

kernels in svm

allows to map to non linear relationships without transforming the data

How well did you know this?

Not at all

Perfectly

OvR vs OvO

OvR - train c models each saying if it is the class or not
OvO - train models to compare between each possible pair of classes

How well did you know this?

Not at all

Perfectly

MSE and MAE formulas

take difference, square/take absolute value, sum up, divide by n

How well did you know this?

Not at all

Perfectly

advantages of MSE and MAE

MSE - differentiable, good for learning
MAE - result is interpretable, simple, less prone to outliers

How well did you know this?

Not at all

Perfectly

L1 regularization

make least important weights as close to 0 as possible

How well did you know this?

Not at all

Perfectly

L2 regularization

have the smallest weights possible

How well did you know this?

Not at all

Perfectly

Elastic net

Study These Flashcards

combines l1 and l2

l0 regularization

Study These Flashcards

looks to have the most amount of 0s in the weights

what do gini impurity and entropy measure?

Study These Flashcards

how impure the data is
a lower number means the data is more pure
we want more pure data in decision trees, means the split is better

pros and cons of decision trees

Study These Flashcards

pros
- not much feature engineering
- interpretable
- applies to many tasks
cons
- can overfit
- high variance

what is bagging

Study These Flashcards

training lots of models of the same type, each on a different subset of the data
bagging has replacement

what is boosting

Study These Flashcards

training lots of weak learners, trained to undo the errors of the previous models

types of kernels

Study These Flashcards

quadratic, polynomial, rbf

when to use cross validation?

Study These Flashcards

when the data set is small
when we want to do hyperparameter tuning

why have a validation set?

Study These Flashcards

to make sure the model is not overfitted (hyperparameters)

what is inductive bias

bias from assumptions that the model makes based on its design

type 1 error

false positive, predicted positive when it should have been negative

type 2 error

false negative, predicted negative when it should have been positive

accuracy

correct vs total

recall

also called true positive rate TP / (TP + FN) how many predicted positives were correct

false positive rate

FP / (FP + TN)

precision

TP / (TP + FP) how many true positives were positive?

how to set up confusion matrix

actual on top, predicted on the side positive positive in the top left

why does precision recall tradeoff exist

precision and recall are sort of inversely related guessing more positives leads to higher recall, but lower precision

ensemble learning for classifiers

hard voting - predict the class with the most votes soft voting - predict the class with the highest probability

what is the base rate fallacy?

occurs when the positive label is rare most of the time, then, when we predict positive, it is a false positive

Midterm Flashcards

(36 cards)