Linear Regression Flashcards

Question 1

Q

Linear regression

Answer

A

models the relationship between an independent (explanatory) variable $X$ and a (real-valued) dependent value $Y$.

Question 2

Q

Intercept of the line

Answer

A

this the extra independent variable of the linear equation
L = B_0 + B_1*x
B_0 is the intercept. Means without x then still would have B_0.
B_1 is the slope of the line, increase per unit added.

Question 3

Q

R^2

Answer

A

measure how good linear regression is from 0 to 1. takes variance and error into account.
Says how much the model explains the variance
correlation squared

Question 4

Q

hypothesis testing

Answer

A

use standard error
Evaluate likelihood of obtaining as extreme of a model as computed
null hypothesis there is a linear relation
add a normally distributed variable to model
Look at the p values for the intercept and slope if < 0.05 (give proba of getting these values). If <0.05 proba 95% that value is contained in interval of it.

Question 5

Q

standard error

Answer

A

the standard deviation of its sampling distribution or an estimate of that standard deviation.
the standard error of the mean is a measure of the dispersion of sample means around the population mean.

Question 6

Q

Residual

Answer

A

A residual is the vertical distance between a data point and the regression line. Each data point has one residual. They are positive if they are above the regression line and negative if they are below the regression line. In other words, the residual is the error that isn’t explained by the regression line.

Question 7

Q

Confounder

Answer

A

Can be significant simple linear regression (p<0.05) but not significant in multilinear (p>0.05)
In statistics, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Independent variables are the explonatory variables X and the dependent is the predictor Y.
If 2 variables are correlated and used to predict Y then the accuracy might not be impacted but the coefficients associated with each variable might not be meaningful anymore.

Question 8

Q

Categorical variables

Answer

A

do dummy variables 0 and 1
Don’t do more as otherwise, it indicates an ordinal value which isn’t the case !
A categorical value can be added to the multilinear model if it is one: then it also changes the intercept !

Question 9

Q

non linear

Answer

A

can add a degree to a variable to introduce nonlinearity

Question 10

Q

Adjusted R^2

Answer

A

R^2 reduces if you add more and more predictors and there is not much gain

Question 11

Q

Razor method

Answer

A

should choose simplest method if not much gain in higher d or more variables.

Question 12

Q

bias-variance tradeoff

Answer

A

The bias of the method is the error caused by the simplifying assumptions built into the method.
The variance of the method is how much the model will change based on the sampled data.
The irreducible error is error in the data itself, so no model can capture this error.
There is a tradeoff between the bias and variance of a model. High-variance methods (e.g., the blue method) are accurate on the training set, but overfit noise in the data, so don’t generalized well to new data. High-bias models (e.g., the black method) are too simple to fit the data, but are better at generalizing to new test data.

Question 13

Q

Generalize a model

Answer

A

cross-validation (k fold, split K times, train on K-1 and test on remainder, test different parameters and take model with the best average performance)

step-wise selection (forward selection: we start with one predictor, find the best model with only one predictor (based on a performance metric), move to models with two predictors (by keeping the one predictor fixed) etc.
backward selection: opposite as above, we start with a model with all predictors and reduce them one by one.
Not guaranteed to find the best selection)

regularization (Lasso and Ridge, Lasso drive some coef to 0 and thus do selection at the same time, Ridge penalize too much variables)

Linear Regression Flashcards

(13 cards)