Lecture notes 2 Multivariate regression? Flashcards
Why may multivariate data be better than bi-variate?
Bivariate can be biased as it does not take into account other relevant variables for explaining the relationship.
How do you interpret coefficients in multi-variate?
you partially differentiate them W.R.T variable you are interpreting.
-What is a difference when interpreting coefficients compared to the bivariate case?
- In multi-variate case it is the change in Y from a change in xi holding all else constant.
In a Y = B0 + X1B1 + XKBK
how do we interpret B2
What should you remember as it is multivariate?
A 1 unit change in X1 causes a B1 unit change in Y (HOLDING ALL ELSE CONSTANT)
In Y = B0 + B1log(X1) … Bk how
How do we interpret Beta 1 for small change and non small change!
What should you remember as it is multivariate?
for small change a 1% increase in X1 causes a B1/100 increase in Y
For a big change must take exact values of log(x) and then interpret beta
HOLDING ALL ELSE CONSTANT!
How do you interpret log of outcome
log(Y) = B0 + X1B1 + BkXk
What should you remember as it is multivariate?
for small change a 100 x beta
for big change you must take expectation of beta.
so it is 100 x exp(beta -1)
(HOLDING ALL CONSTANT)
How do you intepret a log log model?
log(Y) = B0 + B1log(X) + B2log(Xk)
For small changes a 1% increase in X is a Beta % increase in Y
For big changes will have to take exact.
How do you interpret a squared model like
Y = X1B1 + X^2B2 …. XkBk
a 1 unit change in X causes a B1 + 2XB2 change in Y
How do you find where the conditional expectation peaks?
You = the derivative to 0
What are the assumptions made for Multivariate regression models?
Same CLRM as ever:
- E(Xi | ei) = 0
- V(xi | ei) = sigma squared
-Cov(ei , ej) = 0
-Ei varies N(0, sigma squared)
1.What is an alternative method to get the b1 estimate in a model like
Y = B0 + B1X1….bkxik
2.What is this method essentially doing?
3.. What does this theoretically explain?
- Partition regression
You first do a regression of Y on everything but X1 and save the residual. Call this U
(This is portion of Y not explained by any other variables in the model).
Then regress X1 on all explanatory variables and save residual call this V
This explains the portion of Xi that is not explained by the other variables.
Then regress you on U on VB1 and the b1 is the coefficient of X1
- You are removing all effects from everything other than X1.
- This theoreticlly epxlains why we can interpret multivariate coefficients as holding constant.
what are the steps for a hypothesis test for a multivariate regression on one variable
State hypothesis:
B0 =
B1 not equaled to
Find CV, this will be from T table with
S. L and DOF.
Then compute T stat which is estimate - hypothesis / Se(b)
Compare to CV from t tables
Then reject or accept H0.
How do you deal with the scenario when you are hypothesis testing more than 1 coefficient for calculating t stat?
- What happens if there is a se(100b2)
There are two se(b1) + se (b2)
This is the same as square root of variance.
So it is V(b1) + V(b2) + 2Cov(b1,b2)
- You square the 100 and put it outside the V then put 100 out the cov
How to test multiple restrictions in hypothesis testing?
What are the steps and what is a trick?
Use an F test:
F test = (RSS^r - RSS^u /d ) / RSS^u / DOF
Then find CV. C = 0.05 or something
CV F d, DOF numerator and denominator degrees of freedom.
Then compare this to F test to accept or reject H0.
What is another f-test formula that can be used?
When should it be used?
Overall significance test
R^2u / k / (1-R^2u / DOF)
It should be used the TSS of unrestricted and restricted are the same hence all coefficients are 0
How do you work out adjusted r^2?
1 - (RSS/DOF) / TSS / (n-1)
What is an additive dummy variable?
What does it look like graphically?
Eg Y = B0 + B1X1
B1 = 1 or 0 otherwise for something
As the dummy can be switched on or off the graph is just an upwards shift of the dummy being switched off. Shift of B1
Can we use more than one categorical variable (that are the same thing).
NO!
This will mean there will not be 1 solution and regression will not work.
Graphically what would an equation with multiple categorical dummy variables look like?
It would be many lines all shifted from each other to account for the different dummies.