Question 1

Linear regression formula:

Accepted Answer

y = b0 + b1x where... y = output b0 = y-intercept b1 = slope x = input

Question 2

Three steps in fitting a linear regression model:

Accepted Answer

1. Define the model: choose the 'class' of functions that relates the inputs (x) to the output (y) 2. Define your training loss 3. Find the function in your class that gives the smallest training loss

Question 3

What is training loss function?

Accepted Answer

Measures the deviation of the model fits from the observed data Large loss = poor fit

Question 4

What are residuals?

Accepted Answer

Errors from the model fit: data = fit + residual

Question 5

What are the two main types of loss function?

Accepted Answer

1. Least Absolute Deviation (LAD, L1-norm, Lasso): minimize the sum of absolute values of residuals (eliminate outliers) 2. Ordinary Least Squares (OLS, L2-norm, Ridge): minimize the sum of squared residuals (shrink but don't eliminate)

Question 6

When should you use L1-norm vs. L2-norm?

Accepted Answer

L1-norm should be used when outliers can be ignored (lots of features and unsure whether they are necessary) whereas L2-norm should be used when all the features must be considered.

Question 7

What is the coefficient of determination and how is it calculated?

Accepted Answer

The proportion of the explained variance in the response variable, or the quality of the fit of a linear regression model, where 0 = no fit and 1 = perfect fit.

It is calculated as 1 - (Residual Sum of Squares / Total Sum of Squares) or (1 - RSS / TSS)

Question 8

What are robust statistics?

Accepted Answer

Statistics that are not greatly influenced by the inclusion of outliers.

For example, as measures of central tendency, mean is non-robust while median is robust.

L1-norm is robust.
L2-norm is non-robust.

Question 9

What is classfication?

Accepted Answer

Prediction where target variable Y is categorical - often binary - Y can take on value in a set {o1, o2, ...} - Y is discrete and finite

Question 10

What is logistic regression?

Accepted Answer

A form of classification in which an S-shaped sigmoid function is used to calculated the probability of each potential output.

Question 11

What is the natural logistic regression classification rule?

Accepted Answer

Binary classification: F(XB) = 1 if P(Y=1X,B) >= 0.5, 0 otherwise If sigmoid function > 0.5 == 1 If sigmoid function < 0.1 == 0 Can be simplified to linear B0+BX >= 0

Question 12

What is regularization?

Accepted Answer

Optimizing likelihood plus penalty on the size of the parameters, preventing infinite optimal solutions.

Question 13

What is the major flaw of accuracy rate as a metric for evaluating classifiers?

Accepted Answer

Can be misleading when classes are imbalanced. Baseline accuracy must be established.

Question 14

What are Type I and Type II errors in classification?

Accepted Answer

Type I: False positive (Predicted +, True -) Type II: False negative (Predicted -, True +)

Question 15

How to calculate precision and recall?

Accepted Answer

Precision = #True Positive / #Predicted Positive Recall = #True Positive / #Class Positive

Midterm Review Flashcards

(26 cards)