Midterm Flashcards

1
Q

when to use ML?

A

to hard to hardcode
automation
problem is changing frequently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

when to not use ML?

A

algorithm already exists
not enough data
ethical concerns
requires explanations, not just predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the free lunch theorem?

A

no model works best for every dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does regularization do?

A

lowers variance without raising bias much

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to pick hyperparameters?

A

grid search, random select

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is softmax regression?

A

train multiple logistic regressions
make sure probabilities all sum to 1
higher temperature means less confident

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is logistic regression?

A

turns linear regression into classification
outputs a probability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

hard margin SVM

A

trying to perfectly separate the two classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

soft margin SVM

A

allow some points to be misclassified, as long as the mistakes are not too large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

kernels in svm

A

allows to map to non linear relationships without transforming the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

OvR vs OvO

A

OvR - train c models each saying if it is the class or not
OvO - train models to compare between each possible pair of classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

MSE and MAE formulas

A

take difference, square/take absolute value, sum up, divide by n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

advantages of MSE and MAE

A

MSE - differentiable, good for learning
MAE - result is interpretable, simple, less prone to outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

L1 regularization

A

make least important weights as close to 0 as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

L2 regularization

A

have the smallest weights possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Elastic net

A

combines l1 and l2

17
Q

l0 regularization

A

looks to have the most amount of 0s in the weights

18
Q

what do gini impurity and entropy measure?

A

how impure the data is
a lower number means the data is more pure
we want more pure data in decision trees, means the split is better

19
Q

pros and cons of decision trees

A

pros
- not much feature engineering
- interpretable
- applies to many tasks
cons
- can overfit
- high variance

20
Q

what is bagging

A

training lots of models of the same type, each on a different subset of the data
bagging has replacement

21
Q

what is boosting

A

training lots of weak learners, trained to undo the errors of the previous models

22
Q

types of kernels

A

quadratic, polynomial, rbf

23
Q

when to use cross validation?

A

when the data set is small
when we want to do hyperparameter tuning

24
Q

why have a validation set?

A

to make sure the model is not overfitted (hyperparameters)

25
Q

what is inductive bias

A

bias from assumptions that the model makes based on its design

26
Q

type 1 error

A

false positive, predicted positive when it should have been negative

27
Q

type 2 error

A

false negative, predicted negative when it should have been positive

28
Q

accuracy

A

correct vs total

29
Q

recall

A

also called true positive rate
TP / (TP + FN)
how many predicted positives were correct

30
Q

false positive rate

A

FP / (FP + TN)

31
Q

precision

A

TP / (TP + FP)
how many true positives were positive?

32
Q

how to set up confusion matrix

A

actual on top, predicted on the side
positive positive in the top left

33
Q

why does precision recall tradeoff exist

A

precision and recall are sort of inversely related
guessing more positives leads to higher recall, but lower precision

34
Q

ensemble learning for classifiers

A

hard voting - predict the class with the most votes
soft voting - predict the class with the highest probability

35
Q

what is the base rate fallacy?

A

occurs when the positive label is rare
most of the time, then, when we predict positive, it is a false positive