Midterm Review Flashcards
Linear regression formula:
y = b0 + b1x
where…
y = output
b0 = y-intercept
b1 = slope
x = input
Three steps in fitting a linear regression model:
- Define the model: choose the ‘class’ of functions that relates the inputs (x) to the output (y)
- Define your training loss
- Find the function in your class that gives the smallest training loss
What is training loss function?
Measures the deviation of the model fits from the observed data
Large loss = poor fit
What are residuals?
Errors from the model fit: data = fit + residual
What are the two main types of loss function?
- Least Absolute Deviation (LAD, L1-norm, Lasso): minimize the sum of absolute values of residuals (eliminate outliers)
- Ordinary Least Squares (OLS, L2-norm, Ridge): minimize the sum of squared residuals (shrink but don’t eliminate)
When should you use L1-norm vs. L2-norm?
L1-norm should be used when outliers can be ignored (lots of features and unsure whether they are necessary) whereas L2-norm should be used when all the features must be considered.
What is the coefficient of determination and how is it calculated?
The proportion of the explained variance in the response variable, or the quality of the fit of a linear regression model, where 0 = no fit and 1 = perfect fit.
It is calculated as 1 - (Residual Sum of Squares / Total Sum of Squares) or (1 - RSS / TSS)
What are robust statistics?
Statistics that are not greatly influenced by the inclusion of outliers.
For example, as measures of central tendency, mean is non-robust while median is robust.
L1-norm is robust.
L2-norm is non-robust.
What is classfication?
Prediction where target variable Y is categorical
- often binary
- Y can take on value in a set {o1, o2, …}
- Y is discrete and finite
What is logistic regression?
A form of classification in which an S-shaped sigmoid function is used to calculated the probability of each potential output.
What is the natural logistic regression classification rule?
Binary classification: F(X|B) = 1 if P(Y=1|X,B) >= 0.5, 0 otherwise
If sigmoid function > 0.5 == 1
If sigmoid function < 0.1 == 0
Can be simplified to linear B0+BX >= 0
What is regularization?
Optimizing likelihood plus penalty on the size of the parameters, preventing infinite optimal solutions.
What is the major flaw of accuracy rate as a metric for evaluating classifiers?
Can be misleading when classes are imbalanced. Baseline accuracy must be established.
What are Type I and Type II errors in classification?
Type I: False positive (Predicted +, True -)
Type II: False negative (Predicted -, True +)
How to calculate precision and recall?
Precision = #True Positive / #Predicted Positive
Recall = #True Positive / #Class Positive