Midterm Review Flashcards
Linear regression formula:
y = b0 + b1x
where…
y = output
b0 = y-intercept
b1 = slope
x = input
Three steps in fitting a linear regression model:
- Define the model: choose the ‘class’ of functions that relates the inputs (x) to the output (y)
- Define your training loss
- Find the function in your class that gives the smallest training loss
What is training loss function?
Measures the deviation of the model fits from the observed data
Large loss = poor fit
What are residuals?
Errors from the model fit: data = fit + residual
What are the two main types of loss function?
- Least Absolute Deviation (LAD, L1-norm, Lasso): minimize the sum of absolute values of residuals (eliminate outliers)
- Ordinary Least Squares (OLS, L2-norm, Ridge): minimize the sum of squared residuals (shrink but don’t eliminate)
When should you use L1-norm vs. L2-norm?
L1-norm should be used when outliers can be ignored (lots of features and unsure whether they are necessary) whereas L2-norm should be used when all the features must be considered.
What is the coefficient of determination and how is it calculated?
The proportion of the explained variance in the response variable, or the quality of the fit of a linear regression model, where 0 = no fit and 1 = perfect fit.
It is calculated as 1 - (Residual Sum of Squares / Total Sum of Squares) or (1 - RSS / TSS)
What are robust statistics?
Statistics that are not greatly influenced by the inclusion of outliers.
For example, as measures of central tendency, mean is non-robust while median is robust.
L1-norm is robust.
L2-norm is non-robust.
What is classfication?
Prediction where target variable Y is categorical
- often binary
- Y can take on value in a set {o1, o2, …}
- Y is discrete and finite
What is logistic regression?
A form of classification in which an S-shaped sigmoid function is used to calculated the probability of each potential output.
What is the natural logistic regression classification rule?
Binary classification: F(X|B) = 1 if P(Y=1|X,B) >= 0.5, 0 otherwise
If sigmoid function > 0.5 == 1
If sigmoid function < 0.1 == 0
Can be simplified to linear B0+BX >= 0
What is regularization?
Optimizing likelihood plus penalty on the size of the parameters, preventing infinite optimal solutions.
What is the major flaw of accuracy rate as a metric for evaluating classifiers?
Can be misleading when classes are imbalanced. Baseline accuracy must be established.
What are Type I and Type II errors in classification?
Type I: False positive (Predicted +, True -)
Type II: False negative (Predicted -, True +)
How to calculate precision and recall?
Precision = #True Positive / #Predicted Positive
Recall = #True Positive / #Class Positive
How to calculate F-measure?
2 * [(Precision * Recall) / (Precision + Recall)]
How to interpret the ROC curve?
If I randomly generate a positive example and a negative example, what is the probability my classifier puts them “in the right order”?
What is the range of AUROC values?
0.5 (worst, completely random) - 1 (best, perfect classification)
What is the most important tool to limit overfitting?
Withholding a test dataset.
High training error and high testing error means the model is _____?
Underfit
Low training and high testing error means the model is ______?
Overfit
Two ways to combat overfitting in decision trees:
- Bagging
- Random Forest
What is bagging?
Bootstrap Aggregation
Bootstrap B datasets, fit a deep tree (high variance) for each dataset and average.
When is bagging appropriate?
Regression.
Not always true in classification as bagging a bad classifier will make the average worse.
What is Random Forest?
Bagged estimator, but only on a random subsample of the features to decorrelate the trees.
Why are random forests popular among data scientists?
- Low bias and low variance
- Can measure feature importance