Midterm Flashcards
when to use ML?
to hard to hardcode
automation
problem is changing frequently
when to not use ML?
algorithm already exists
not enough data
ethical concerns
requires explanations, not just predictions
what is the free lunch theorem?
no model works best for every dataset
what does regularization do?
lowers variance without raising bias much
how to pick hyperparameters?
grid search, random select
what is softmax regression?
train multiple logistic regressions
make sure probabilities all sum to 1
higher temperature means less confident
what is logistic regression?
turns linear regression into classification
outputs a probability
hard margin SVM
trying to perfectly separate the two classes
soft margin SVM
allow some points to be misclassified, as long as the mistakes are not too large
kernels in svm
allows to map to non linear relationships without transforming the data
OvR vs OvO
OvR - train c models each saying if it is the class or not
OvO - train models to compare between each possible pair of classes
MSE and MAE formulas
take difference, square/take absolute value, sum up, divide by n
advantages of MSE and MAE
MSE - differentiable, good for learning
MAE - result is interpretable, simple, less prone to outliers
L1 regularization
make least important weights as close to 0 as possible
L2 regularization
have the smallest weights possible
Elastic net
combines l1 and l2
l0 regularization
looks to have the most amount of 0s in the weights
what do gini impurity and entropy measure?
how impure the data is
a lower number means the data is more pure
we want more pure data in decision trees, means the split is better
pros and cons of decision trees
pros
- not much feature engineering
- interpretable
- applies to many tasks
cons
- can overfit
- high variance
what is bagging
training lots of models of the same type, each on a different subset of the data
bagging has replacement
what is boosting
training lots of weak learners, trained to undo the errors of the previous models
types of kernels
quadratic, polynomial, rbf
when to use cross validation?
when the data set is small
when we want to do hyperparameter tuning
why have a validation set?
to make sure the model is not overfitted (hyperparameters)
what is inductive bias
bias from assumptions that the model makes based on its design
type 1 error
false positive, predicted positive when it should have been negative
type 2 error
false negative, predicted negative when it should have been positive
accuracy
correct vs total
recall
also called true positive rate
TP / (TP + FN)
how many predicted positives were correct
false positive rate
FP / (FP + TN)
precision
TP / (TP + FP)
how many true positives were positive?
how to set up confusion matrix
actual on top, predicted on the side
positive positive in the top left
why does precision recall tradeoff exist
precision and recall are sort of inversely related
guessing more positives leads to higher recall, but lower precision
ensemble learning for classifiers
hard voting - predict the class with the most votes
soft voting - predict the class with the highest probability
what is the base rate fallacy?
occurs when the positive label is rare
most of the time, then, when we predict positive, it is a false positive