2.1: Multiple linear regression Flashcards

1
Q

When is multiple linear regression appropriate?

A

It is the go to technique when we want to predict a numerical (continuous) outcome variable from several independent variables in both statistics and machine learning applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In a simple linear regression what does Bxi represent?

yi = a + Bxi + ei

What would B0 represent?

A

a (alpha) is of course the intercept and the B (beta) is the regression slope/ coefficient.

Bxi represents a fixed value, usually the independent variable. At times however, such as gender, Bxi represents a variable that is not fixed by the researcher and therefore dopes not represent a fixed variable

B0 represents the intercept (just different notation for a)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If we wanted to generate a linear model of ACT scores vs SATV scores in r how would we write this?

A

lm(ACT ~ SATV, psych: :sat.act)

The left side has the dependent variable, the right side has all the predictor variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is meant by the predictable and unpredictable parts of linear regression models?

A

in yi = B0 + B1xi1 + Bi2xi2 +…+ Bipxip + ei

B0 + B1xi1 + Bi2xi2 +…+ Bipxip is the predictable part (a + Bxi).

The unpredictable part is ei; the error of the model. It is every deviation of y from this model that we cannot explain from the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we model ei ?

A

With a random variable of which we only know the expected value is equal to 0 and the variance is equal to sigma echelon squared (σ |2, e|)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does R find the betas in the regression analysis?

A

By minimizing sum of squared errors; Obtaining estimated coefficients ˆβ0, ˆβ1, ˆβ2,…, ˆβp by minimizing the prediction errors in the sample:

By defining a function s that is the sum of the squared errors:
S(B) = Ee|2,i| = E(B0 + B1xi1 + Bi2xi2 +…+ Bipxip - yi) ^2

aka the coefficients are found by the sum of the squared errors; the sum of the scores minus the predicted scores squared.

We can then find the regressions coefficents, the slope and the intercept, by setting the derivative of this S with respect to those coefficients equal to 0. So all the derivatives are made equal to 0:

dS / dB0 = 0, dS / dB1 = 0, …, dS / dBj = 0, …,dS / dBp = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we calculate these derivatives when minimizing the sum of squared errors?

A

Using the chain rule ourselves to verify the derivatives with respect to the regression coefficients look like this:

dS/dβj =∑ 2( ˆβ0+ ˆβ1xip+ ˆβ2xi2 + ··· + ˆβpxip − yi) xij

except for the intercept (B0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

This is quite a tedious task. How do we make this easier for ourselves?

A

Through matrix algebra; We start by writing an equation for each of the observations:

y1 = ˆβ0 + ˆβ1x11 + ˆβ2x12 + ···+ ˆβpx1p + ˆε1
y2 = ˆβ0 + ˆβ1x21 + ˆβ2x22 + ···+ ˆβpx2p + ˆε2

yn = ˆβ0 + ˆβ1xn1 + ˆβ2xn2 + ···+ ˆβpxnp + ˆεn

This allows us to write this as a matrix equation in the form
(^B0)
(y1 ) (1 x11 x12 … x1p ) (^B1) (^e1)
(y2) = (1 x21 x22 … x2p ) (^B2) +(^e2)
(… ) (… … …. … ) (…) (…)
(yn) (1 xn1 xn2 … xnp ) (^Bp) (^en)

If we multiply this out, we arrive at exactly the predictable part of these equations. If we substitute these for symbols it takes the familiar for of:

y = X^B + ^e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In the previously described equation
y = X^B = ^e
What is X commonly referred to as?

A

The design matrix or model matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we make a model matrix in R?

A

R has a handy function model.matrix() that returns the design matrix of a model:

model.matrix(y ~ GRP + AGE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

From this matrix how can we find the beta?

A

Quite easily, it’s also the least squares estimate.
- multiply each side of the equation y = X^B + ^e by the X transposed:

X′y = X′X^β or ̂β = (X′X)^−1 X′y ignoring the ei part

The result of which is a p + 1 x p + 1 which; it is a square matrix. If it has an inverse we can pre-multiply both sides of the equation X′y = X′X^β by the inverse of this matrix X. when we do that we get rid of this matrix and end up with ̂β = (X′X)^−1 X′y. This is the least squares estimator; it is the estimate you would obtain if you would take S, the sum of the squared prediction errors, set the derivatives of the function = 0 and solved for beta.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How would you calculate this in R given that:

X = cbind(rep(1,5),
GRP=c(0,0,1,0,1),
AGE=c(45,23,18,32,24))

A

y = c(8,2,7,6,8)
X.X = t(X) %% X
X.y = t(X) %
% y
( beta_hat = solve(X.X, X.y) )

coef(lm(y ~ X))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you decide on the quality of this estimator? (β = (X′X)^−1 . Xy) (2)

A

Two important aspects of determining this:
1) Bias of the estimate: e.g estimating cognitive ability with stakes; don’t want it to be bias

2) Reliability of the estimate: You want a very small sampling error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can you test whether a

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly