Lecture 3 Flashcards
What is multiple variable regression?
Having more explanatory variables which will allow us to estimate how other variables affect the estimate
What is the problem of having too many variables in a regression model?
Including many variables may sometimes cause casual relationships. This occurs when one variable in the model is caused by the other variable which is also in the model
What are the 4 assumptions of the classical multivariable regression model
- E(e|x)=0 , conditional on all X variables
- V(e|x)= Theta squared, Across different points of X, e has the same spread for different values of X. We assume the difference from values of X and the regression line are constant.
- Cov(i,j|X)=0 for I does not equal J. This shows us that the regression error for person i does not tell us anything about person j.
- i|x-N(0, Theta squared). This shows us that the regression error follows a normal distribution
How do you minimise the regression line and do an example
- Square the regression line gathered from data
- Find FOC’S W.R.T both variables
3, Replace the equation derived with ei - You are left with Sum eixi1=0
What is partioned regression used for?
This is a theoretical device used to understand what the coefficients of B means by conditioning on one of the variables only and keeping the other variables constant.
What are the 3 steps involved in conducting a partitioned regression and do this using an example
Step 1: Regress Yi on (1, Xi2,Xi3…Xik) This means we run a regression excluding all the effects from X2…Xk. This creates the residual error term U^i=Yi-Gamma0-Gamma2 xi2……Gamma K Xik
Step 2: Regress Xi1 on (1,Xi2,…,Xik). We estimate part of X1 that has a relationship with X2 through to Xk and then saving the residual we are removing the part of X which has a relationship with X2,X3…Xk
Step 3: Regress U^i and V^i:
U^i=B1V^i +ni
B1 represents the effect on X1 on Y after we control for X2….Xk. B1 will capture the effect of a change in X1 after you have fixed all other X’s
What is the interpretation of B in a partitioned regression and a normal regression?
- In a partioned regression model B1 is the relationship between Y and X1 after we ‘remove’ all the effects from 1,x2,..,xk.
- In the normal regression model B1 represents the partial effect of X1 on Y after we consider other variables in the model
Explain how B can have a causual interpretation?
We can have a causual interpretation of Bi if we believe after we control for X2….Xk the people are simmilar and therefore can obtain a casual interpretation of B1.
E.g say X= education and Y= wages after we control for X2…Xk if we believe that college graduates and high school graduates are comparable then we can attach a causal interpretation to B1.
What is the interpretation of B in each model:
- Y=Bo +X1B1+…+XkBk +e
- Y=Bo + log(X1)B1 +…+ XkBk + e
- Log(Y)=Bo+X1B1+…+XkBk + e
- Log(Y)=Bo+ log(X1)B1+…+XkBk + e
- Y=Bo+X1B1 + X1^2B2+….+XkBk+ e
- B1= the change in Y when we change X1 by one unit whilst holding X2….Xk constant.
- B1/100 equals the change in Y, when X1 increases by 1% holding X2..Xk constant. OR B is the change in Y when X increases by 100%
- e(B) is the chnge in Y when x changes by 1 unit
- B1 is the percentage change in Y when X1 increases by 1% holding X2….Xk constant.
- When X1 increases by 1 unit Y changes by B1+2x1+2b2b(differentiated WRT X1)
What are the 4 steps in conducting a T-Test
- Write down null and alternative hypothesis: Ho: B1=B1 (bar) Hi: B1
- Choose a significance level C and find the critical values associated with these critical values. tc/2, Dof, t1-c/2,DoF where DoF= n-(K+1), where K is the no.of parameters.
- Compute Test statistic: T=b1-B1(Bar)/Se(b) where Seb(b) does not have a formula so we use stata to compute this.
- Reject the null if Tt1-c/2
How would you compute the test statistic of a more complicates T test where you would test something like B1+B2=1
- In this case use V(b1+b2)=V(b1) + V(b2) + 2cov(b1,b2) as the variance.
- Write down the null hypothesis B1+B2=c
-Test statistic T= b1+b2-c/ Root(V(b1)+V(b2)+2cov(b1,b2)
Has a T distribution with DOF n-(k+1). Then follow the normal approach by comparing T with critical values.
What are the 3 steps in conducting an F test when you are testing more than one restriction
- Set Up Ho:Bj=Bj(Bar) for all J=1,…,d H1:Bj Does not equal Bj(Bar)
- Pick a significance level c and find the critical values from F d,Dof where DOF is for the unrestricted model.
- Compute (Rss(r)-Rss(u)/d divided by Rss(u)/DoF
How would you use RSS and TSS with an F- Test:
- F statistic: (RSS(r)-RSS(u)/d divided by Rss(u)/DoF
- Using the equation R^2=1-RSS/TSS, if the restricted and unrestricted model models have the same TSS we can write F statistic in terms of R^2
- From the equation dividing RSS by TSS see notes for more info we get (R^2u-R^2r)/d divided by (1-R^2u)/DoF
When can the equation (R^2u-R^2r)/d divided by (1-R^wu)/DoF not be used
This formula with R^2 does not apply if the null hypothesis has non zero coefficents.
What are the three properties of OLS estimator?
- Linear estimator: bj=Sum(Wji,Yi) This just means the OLS estimator can be written in linear form.
- Unbiasedness E(bj|x) OLS being unbiased means that when we compute the OLS estimator for many observations then we compute the average of this OLS estimator and the average gives us what we want to estimate which is B.
- BLUE(Best Linear Unibaised Estimator) OLS has the smallest variance amongst all linear unbiased estimators.