2.1: Multiple linear regression Flashcards
When is multiple linear regression appropriate?
It is the go to technique when we want to predict a numerical (continuous) outcome variable from several independent variables in both statistics and machine learning applications
In a simple linear regression what does Bxi represent?
yi = a + Bxi + ei
What would B0 represent?
a (alpha) is of course the intercept and the B (beta) is the regression slope/ coefficient.
Bxi represents a fixed value, usually the independent variable. At times however, such as gender, Bxi represents a variable that is not fixed by the researcher and therefore dopes not represent a fixed variable
B0 represents the intercept (just different notation for a)
If we wanted to generate a linear model of ACT scores vs SATV scores in r how would we write this?
lm(ACT ~ SATV, psych: :sat.act)
The left side has the dependent variable, the right side has all the predictor variables.
What is meant by the predictable and unpredictable parts of linear regression models?
in yi = B0 + B1xi1 + Bi2xi2 +…+ Bipxip + ei
B0 + B1xi1 + Bi2xi2 +…+ Bipxip is the predictable part (a + Bxi).
The unpredictable part is ei; the error of the model. It is every deviation of y from this model that we cannot explain from the model
How do we model ei ?
With a random variable of which we only know the expected value is equal to 0 and the variance is equal to sigma echelon squared (σ |2, e|)
How does R find the betas in the regression analysis?
By minimizing sum of squared errors; Obtaining estimated coefficients ˆβ0, ˆβ1, ˆβ2,…, ˆβp by minimizing the prediction errors in the sample:
By defining a function s that is the sum of the squared errors:
S(B) = Ee|2,i| = E(B0 + B1xi1 + Bi2xi2 +…+ Bipxip - yi) ^2
aka the coefficients are found by the sum of the squared errors; the sum of the scores minus the predicted scores squared.
We can then find the regressions coefficents, the slope and the intercept, by setting the derivative of this S with respect to those coefficients equal to 0. So all the derivatives are made equal to 0:
dS / dB0 = 0, dS / dB1 = 0, …, dS / dBj = 0, …,dS / dBp = 0
How do we calculate these derivatives when minimizing the sum of squared errors?
Using the chain rule ourselves to verify the derivatives with respect to the regression coefficients look like this:
dS/dβj =∑ 2( ˆβ0+ ˆβ1xip+ ˆβ2xi2 + ··· + ˆβpxip − yi) xij
except for the intercept (B0)
This is quite a tedious task. How do we make this easier for ourselves?
Through matrix algebra; We start by writing an equation for each of the observations:
y1 = ˆβ0 + ˆβ1x11 + ˆβ2x12 + ···+ ˆβpx1p + ˆε1
y2 = ˆβ0 + ˆβ1x21 + ˆβ2x22 + ···+ ˆβpx2p + ˆε2
…
yn = ˆβ0 + ˆβ1xn1 + ˆβ2xn2 + ···+ ˆβpxnp + ˆεn
This allows us to write this as a matrix equation in the form
(^B0)
(y1 ) (1 x11 x12 … x1p ) (^B1) (^e1)
(y2) = (1 x21 x22 … x2p ) (^B2) +(^e2)
(… ) (… … …. … ) (…) (…)
(yn) (1 xn1 xn2 … xnp ) (^Bp) (^en)
If we multiply this out, we arrive at exactly the predictable part of these equations. If we substitute these for symbols it takes the familiar for of:
y = X^B + ^e
In the previously described equation
y = X^B = ^e
What is X commonly referred to as?
The design matrix or model matrix
How can we make a model matrix in R?
R has a handy function model.matrix() that returns the design matrix of a model:
model.matrix(y ~ GRP + AGE)
From this matrix how can we find the beta?
Quite easily, it’s also the least squares estimate.
- multiply each side of the equation y = X^B + ^e by the X transposed:
X′y = X′X^β or ̂β = (X′X)^−1 X′y ignoring the ei part
The result of which is a p + 1 x p + 1 which; it is a square matrix. If it has an inverse we can pre-multiply both sides of the equation X′y = X′X^β by the inverse of this matrix X. when we do that we get rid of this matrix and end up with ̂β = (X′X)^−1 X′y. This is the least squares estimator; it is the estimate you would obtain if you would take S, the sum of the squared prediction errors, set the derivatives of the function = 0 and solved for beta.
How would you calculate this in R given that:
X = cbind(rep(1,5),
GRP=c(0,0,1,0,1),
AGE=c(45,23,18,32,24))
y = c(8,2,7,6,8)
X.X = t(X) %% X
X.y = t(X) %% y
( beta_hat = solve(X.X, X.y) )
coef(lm(y ~ X))
How do you decide on the quality of this estimator? (β = (X′X)^−1 . Xy) (2)
Two important aspects of determining this:
1) Bias of the estimate: e.g estimating cognitive ability with stakes; don’t want it to be bias
2) Reliability of the estimate: You want a very small sampling error
How can you test whether a