Exam Notes Flashcards

1
Q

What does ANOVA Stand for?

A

Analysis of Varience

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What distribution is used in a t-test what are degrees of freedom?

A

T distribution

n-k-1

K is regressors

n= observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does H0:Bj=0 imply?

A

That the regressor in question has no statistical significance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Show the difference for H1 of a one and two-sided t-test.

A

H1: Bj>0 or Bj<0 (One-sided)

H1: BJ not = 0 (Two-Sided)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

At what point do you reject the null hypothesis?

A

When t>C.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the critical value used for a two-sided test?

A

(1-alpha)/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do we assume when rejecting nulls?

A

That the alternative is two-sided when we reject.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the formula for a t-test. Suppose you are testing Bj=Aj

A

(Bj-Aj)/se(Bj)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the formula for confidence intervals?

A

BJ+- C.se(Bj)

Where C=1-alpha/2 in a tn-k-1 distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is the p-value calculated?

A

Find the t stat then find the percentile that the t-stat is within. P-val is the probability that this t-stat would be observed if the null was true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Show the derivation of SE(B1-B2)

A

See notes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to test B1=B2

A

Know that H0: B1–B2

and that B1+B2 = 0 =thy1

Then change regression to include thy 1 and test thy 1’s significance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the point of a joint significance test?

A

It is to see if there is any warrant in including the regressors in the model. Basically is the increase in SSR too much.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the formula for the F-test?

A

F=(SSRr-SSRur)/q)/(SSRur/n-k-1)

Where q is the number of restricted regressors.

K includes the intercepts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What if you don’t have RSS, how can you run an F-test

A

Use the R2 formula.

F = (R2ur-R2r)/q/(1-R2ur)/n-k-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the formula for overall significance?

A

F=(R2/k)/(1-R2)/n-k-1

If only one exclusion is being restricted then F=t^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is:

  • Type 1 error
  • Type 2 error
A

Type 1: Prob of null rejection when it’s true. (size of test) sig level = alpha

Type 2: prob of accepting a false null. (power of test) 1- prob type 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the marginal fx and elas

A

See notes and seminar for interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does a qualitative variable do?

A

It describe features of a data set that are not quantifiable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What can a dummy independent variable do?

A

Allow the intercept or the slope to change due to different points in the data.

For example;

-oil crisis, financial crisis, drought.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Show a model with a single dummy variable inside

A

y=B0+sigmanaughtd+b1x+u

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What formula should you use to interpret a coefficient on a variable when the dependent variable is a log variable.

A

When the coefficient is over 0.2, use 100*(e^coeff-1)

Make sure to put the minus in if the coefficient is negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What must you remember if you are making seasonal dummies for a quarterly data set?

A

You must remember to only include 3. Let Q1 be represented by the intercept. It is known as the base category, the other three coefficients are compared against it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do you put a dummy for multiple categories?

A

Imagine everyone is either:

  • HS dropout
  • HS grad
  • College Grad

And you want to compare HS and College grads to HS dropouts.

You would include two dummy variables;

hsgrad=1 if only has grad, 0 other wise

colGRAD=1 if col grad and 0 otherwise.

Now the effect of HS dropout can be seen if the other two variables are 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How can dummy variables be used for an interaction term. Give an example.

A

Imagine if you want to see that impact on some random dependent variable of being both married and female.

You would create three dummies:

  • Female dummy
  • Married Dummy
  • Female*Married dummy.

Then would have coefficients

a1 a2 a3

respectively.

So then if you wanted to see the effect.

The formal model is as follows;

y=b0+a1fem+a2married+a3fem*married+B1x+u
Single male:

B0+b1

Single female:

B0+a1+B1

Married female:

B0+a1+a2+a3+B1

Married Male:
B0+a2+B1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What happens if dummy variables are interacted with continuous variables?

A

Then it allows the model to differ by both intercept and slope.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does a coefficient imply in the LPM

A

It means you are x amount more likely to achieve success in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are the pros and cons of LPM

A

Pros:

  • Easy to estimate and read
  • More robust than a probit or logit model.

Cons:

  • Predictions may give probability over 1.
  • Assuming the effect is linear may be restrictive.
  • Violates the assumption of homoscedasticity.

As Var(y|x)=P(x)*(1-P(x))

and obvs p x can change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Explain the problem of overfitting the model.

A

This is where you include irrelevant variables in the regression.

It will result in the standard errors of the coefficients to be too large.

This means that the t-statistic will be too small and therefore some parametres that are significant will be deemed insignificant.

30
Q

What is the problem of Underfitting the model?

A

This is where you do not include a relevant variable.

This will cause the OLS estimates to display; Ommited Variable Bias.

It will also cause the standard errors to be incorrect. Therefore hypothesis testing will not be valid.

B2 and cov terms same sign = Upward bias (over-estimated)

B2 and Cov terms different sign = Downward bias (under-estimated).

31
Q

Derive the Bias of underfitting a model

A

See notes, but ensure you can do this.

32
Q

How to interpret the coefficient on a

1) log-Log model
2) log-lin model
3) log-lin model

A

1) B1 is the elasticity with respect to x.
2) B1 is the percentage change with respect to a change in x (times 100) or use formula for over 0.2
3) B1 is the change in y for a 100% change in x.
i. e; divide the coefficient by 100.

33
Q

What are the benefits of using logs in the model?

A
  • They are invariant to scale variables so they help to measure percentage changes.
  • ln(y) distribution is narrower so less outliers
  • direct elasticity estimates.
  • Conditional distribution is often hetroscedastic. Logs help to prevent this.
34
Q

Give two examples of variables you should use in;

  • logs
  • level
A
  • LOGS:
    • > Dollar amounts
  • -> Population

LEVEL:

  • -> Measured in year variables
  • -> Variables in percent or proportion
35
Q

If there is a quadratic term in a regression how do you interpret x?

A

B1+2B2x =change in y/change in y

36
Q

How do you interpret if there is an interaction term?

A

imagine a model of; y=bo+b1x1+b2x2+b3x1x2+u

You would derive to get change in y/change in x=
b1+b3x2

For this, just take x2 at it’s mean value.

37
Q

If you have a non-linear in parametres relationship, how would transform the relationship?

A

You would use the principal of minimizing the RSS (residual sum of squares) to obtain estimates of the parametres.

38
Q

If you are asked to design an experiement to test educational attainment, what would you do? Give the pros and cons.

A

Randomised Control Experiment where students are randomly assigned to private and state schools.

The treatment group is the one that goes private
Control group goes state.

The educational attainment is the difference in the means between the two.

A social experiment effect doesn’t require theory and mimics a clinical trial.

Problems:
You need to rigidly stick to the randomised assignement rule.

  • Attrition: People can drop out of the experiment.
    Could capture the intention to treat rather than actual effect.

-Experimental Effect: You cannot offer the control group a placebo.

39
Q

How does the adjusted R squared statistic work?

A

Adjusted R squred takes into account the no. of regressors in the eqation. It will only increase if there is significant explanatory power of the statistic.

Rbar2=(1-R2)(n-1/n-k-1)

or 1- (ssr/(n-k-1))/(sst/(n-1))

40
Q

When can R bar squared not be used?

A

It cannot be used when the y variables are of different form such as linear and log. Because they describe the SST differently.

41
Q

What are the MLR assumptions

What happens if these assumptions are violated?

A

1) The model must be linear in parametres
2) The model must have constant variance (homoscedastic)
3) The error term must be independent of the regressors.
4) there must be no correlation of the error terms.

Violating the assumptions causes the OLS residuals to become non-random.

42
Q

What are four causes of model misspecification?

What informal tests can you do for these?

A
  • Ommiting relevant variables
  • Including irrelevant variables
  • having a regression curve which is non-linear.
  • Misspecifying a dynamic model for a static model.

Plot a time-series of the residuals.

Scatter diagram of ut vs ut-1.

43
Q

Explain all of the RESET test.

A

The reset test is a general test of the specification errors.

Zero-conditional mean assumption states that no non-linear combination of the regressors added to the equation should be significant.

So this test estimates y from the first model and then generates an auxiliry regresson by including y sq and higher power terms of y to the equation

essentially it adds non-linear combinations of the regressors.

see notes for equation.

then does a joint test on the significance of the coefficients on the higher powers of y.

The test has low power because the added y values are only proxies for any ommited variables.

If you fail the test you definitely have model misspecification, but if you pass the test it doesn’t mean that you definitely don’t have a model misspec problem. .

Ho: is that there is no misspecification of the functional form.

44
Q

Explain all of the test for normality

A

Normality issues:

If the error terms are non-normal; OLS will not be normally distributed so hypothesis testing cannot be carried out with confidence.

If the sample is large> 30

then it is normally distributed and you can use the t-distribution.

The only issue with this is no matter the sample size, you cannot do an f-test.

Normality tests to see if E(skewness) is zero and E(kurtosis) is 3 times sigma^4.

Ho: X,Y and Error are all normally distributed.

Informal Tests:

  • Gen a histogram to observe normality
  • do qnorm shows the quantities of the variable against the quantities of the normal distribution.

Formal Test: it presents a test for normality based on skewness and normality based on kurtosis and then combines the two statistics

45
Q

Explain the test and problem of heteroscedasticity

A

It is a situation where the model has non-constant variance. Cannot observe the error directly, therefore residuals are used as a proxy.

Could use Bartlett’s Test or White’s Test

We use the LM test;

Var(y)=S^2exp(b1z1+b2Z2+…+BkZk)
s is the standard error of the regression
Z’s are functions of the x’s

H0:b=0. What this means is that the variance is constant. It does not change with the X’s or Z values.

H0: Constant variance

P>Chi2=0.7316 reject h1 and there is no evidence against heteroscedasticity.

1) Apply OLS in the model.
2) compute the regression results
3) regress the squared residuals on the explanatory variables (auxiliary regression)
4) tests the coefficient of determination (b) in the regression.

The varience of the residuals should not depend on the independent variables.

You are seeing if the variance of the residual is explainable using the variables in the regression OLS model.

46
Q

How can you fix heteroscedasticity?

A

You can re-specify the model.

You can transform the data.

or you could employ WLS (weighted least squares)

Instead of minimising RSS, the equation will minimise weighted RSS.

Here the RSS will be weighted by the inverse of the varience. Therefore, those with higher varience will receive less weight.

47
Q

What assumption is violated under the condition of autocorrelation?

Give some examples of informal tests.

A

E(Ut,Us)=0

There should be no correlation between the error terms at anypoint in time.

Informal test:

  • Time series plot of the OLS residuals
  • Scatter plot of Ut vs Ut-1
48
Q

Explain a t-test for AR(1).

A
  • You must ensure that the regressors are strictly exogenous.
  • The OLS residuals must be regressed on the first lag and then you t-test the significance of the coefficient on that first lagged residual to see if there is a significant relationship.

However this will only work for first order autocorrelation.

49
Q

Explain the Durbin-Watson test

A

(sum(ut-ut-1)^2/sum(Ut^2) DW=2-2p

So if DW close to 2 p=0 no autocorr
if DW close to 0 p>0 so positive autocorr
if DW close to 4 p<0 so negative autocorr

Ho: No serial correlation present

It cannot work for dynamic relationships.

50
Q

Explain the Breush-Godfrey/LM test.

A

Here you estimate an auxilliary regression.

You regress all of the ut lags on ut.

Ho: the coefficients on the ut lags in the regression equal zero.

Ho: No serial correlation present.

This is a test based on the F-test principles. It compares the RSS of the unrestricted and restricted models.

51
Q

How can you deal with autocorrelation?

A

1) Estimate the Cochrane-Orcutt estimator;
y=b1+b2xt+ut

yt’=yt-pyt-1

xt’=xt-pxt

then you run yt’=b1’+b2xt’+et

2) The model can be respecified
The autocorrelation tests may have failed because;

  • Wrong functional form
  • Ommited variables
  • Misspecifiying a dynamic model as a static one.

Cochrane-orcutt will fix the problems, but the solutions will still be biased if the model is misspecified.

52
Q

Why would an instrument variable be used in a regression?

What causes a breakdown of the assumption?

How does an Instrumental Variable Work?

A

Esentailly we have the asusmption of cov(xt,ut)=0 in the classic linear regression model of

This can be broken by;

  • ommited variables
  • Errors in variables
  • Endogeniety

The instrumental variable X will isolate the part of X which is uncorrelated with et.

53
Q

What are the criteria for valid instruments

A

For an instrument it must be;

relevant: it must have some correlation with the endogeous variable
exogenous: It must not be correlated with the error term.

54
Q

prove the exogeneity of IV

A

See notes

55
Q

Prove why using IV is less efficient than OLS

A

becuase the variance of IVE is higher because the correlation between x and z will be less than 1.

56
Q

Derive the equation for the b1 estimate.

A

see notes

57
Q

Explain how to implement TSLS.

A

First stage; you have a normal OLS but it contains an edogenous variable.

Run a regression with the endogenous as the dependent. Explain it using only exogenous information.

Second Stage:
You will then Run the first OLS again using the prediction of the endogenous var from the first stage.

it will now have had the exogeneity purged out of it.

58
Q

When is IV worse than OLS?

A

when corr (z,u)/corr(z,x) > corr(x,u)

59
Q

What is important about R squared in IV?

A

The zero-conditional mean has been violated so the variance of y now cannot be broken into ESS and RSS as normal.

60
Q

What are structural equations?

A

They characterise the underlying economic theory. Each endogenous variable is is expressed in terms of both endogenous and exogenous variables.

These are equations which arise when you solve the system of endogenous variables.

You would solely express the endogenous variables in terms of the exogenous variables.

61
Q

What is important about the standard errors of the second stage standard errors?

What does it mean if the is only one endogenous variable and one instrument?

A

They are not correct.

2SLS = IV

62
Q

How many instruments do you need?

A

You need as many instruments as endogenous variables.

63
Q

Explain the Errors in Variables in the measurement of the dependent variable.

A

yt=bXt+et: this is the true model

Yt=yt+ut

Yt is a proxy, yt is the true value and ut is the measurement error.

The new model is then Yt=bXt+(et+ut)

Yt=bXt+wt where wt=(et+ut)

The inclusion of the measurement error means the OLS estimate of B is now unbiased. So OLS can now be applied.

The measurement error of the independent variable is much worse than measurement error of the dependent variable.

64
Q

What are the assumptions for errors in variables?

A

E(et)=E(ut) =0

Var(et)=sigma2e
Var(ut)=sigma2u
Var(xt)=sigma2x

Cov (et,Xt)=Cov(ut,Xt)=Cov(et,Ut)=0

65
Q

Explain the errors in variables in the measurement of the independent variable

A

YT=Bxt+et

xt is a latent and unobserved variable.

Xt=xt+ut

where xt is the unobserved true x value and ut is the measurement error of x.

Y=B(xt-ut)=e –> Bxt+wt the model now, wt is error remember.

wt= et -But

Cov(Xt,wt) should be equal to zero, but in this case

Cov(Xt,wt)=Cov(xt+ut, et-But) =-Bsigma2u not equal to zero, which means the estimate is biased.

66
Q

What is meant by attenuation bias

A

it is the biasing of the regression estimate towards zero due to an error in the independent variable.

The greater the x-measurement variance, the closer the estimated slope will be to zero. (slope parameter)

67
Q

Derive the attenuation bias estimate

A

Cov(xt,yt)/var(xt) =Bsigma2/Sigma2x+sigma2n =B/1+(sigma2n/sigma2x)

Sigma 2n is the variance of the measurement of the indepedent variable.

68
Q

How can you test for endogeneity or errors in variables.

A

1) y1=b0+b1y2+b2z1+b3z2+u1

Where Z1 and Z2 are exogenous and Z3 and Z4 are instrumental variables.

you can then do a Hausman (1978) test. This will directly compare the OLS and TSLS estimates.

You can then see if the difference is statistically significant.

Null: OLS and IV are consistent meaning all is exogenous

Alternative: OLS and 2SlS differ. This means y2 is endogenous.

2) y2=pie0+ pie1z1+ pie2z2+pie3z3+pie4z4+v2

This, I think, is the reduced form equation for y2 explained by exogenous information.

If y2 is uncorrelated with u1 in (1), it must be that V2 is uncorrelated with u1.
This will mean that the variable is exogenous or calculated without error.

3) you then add v2 to equation (1)
giving: y1=b0+b1y2+b2z1+b3z2+sigma1V2+error

if simga 1 is significantly different from zero, y2 and OLS residuals from 1 will be serially correlated.
Which means the null of exogenous/no measurement error is rejected.

OLS on 3 is equivalent to TSLS on 1.

HAUSMAN TEST ASSUMES VALID INSTRUMENTAL VARIABLES.

WEAK OR INVALID IV’S GIVES INCORRECT RESULTS.

69
Q

What must an IV be to be used? Explain how to check both.

A

RELEVANCE:
It must be relevant. it must explain a fair chunk of the variation of the x variable.

So there must be very strong correlation.

if not the TSLS estimator will be biased so the statistical inferences will be invalid.

If you have a single endogenous variable, a first-stage f-stat of less than 10 shows a weak instrument. When you put the endogenous in terms of the exogenous information.

test with t for one instrument or F test for multiple instruments.

EXOGENENITY:
The order condition for identification;
-m=k tjhe equation is exactly indentified, get unique estimates.

=m>k: the equation is overidentified, you get multiple estimates

=M

70
Q

What is main drawback of the fixed effects estimation?

A

It introduces an extra parameter for each individual or group which:

1) Is difficult for Stata to handle
2) Causes a big loss in the degrees of freedom so much less efficient from the estimator.

71
Q

In brief terms, how does random effects work?

A

We can treat the individual unobserved effects as random. Therefore, we can say that each is drawn independently from a probability distribution.

72
Q

What is the error called in panel data methods?

A

it is called the idiosyncratic error