EXAM Prep Flashcards

1
Q

Provide a statement of Gauss Markov theorem concerning the optimal properties of the OLS estimators

A

The SRF that is derived using OLS is on its own meaningless, except that it minimises the sum of the squared residuals. What we are interested is making references about the PRF using our SRF, in other words we want to know the precision and the accuracy of SRF estimators. The GM theorem looks into such optimal properties.

GM theorem states: it is sufficient that CLRM assumptions 1-6 hold, such that the OLS estimators are the Best, Linear, Unbiased Estimators. (LINEAR - the estimators are linear function of the data)

These estimators then have following properties:

UNBIASED - means that on average the actual value of the estimates of beta is equal to the true value of beta. E(beta hat)=true beta

BEST (efficient) - means that the OLS estimators have the minimum variance than any other linear unbised estimator. If the estimator is efficient we are minimising the probability that it is long way from the true value of beta.

CONSISTENT - it means that the estimates in repeated samples will converge to their true values as the sample size increases to the infinity.

the coefficient estimates are normally distributed, given the assumption about residuals being normally distributed. This normality assumption enables us to derive the probability, or samplng distributions of beta hat ( normal) and sigma hat squared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A brief discussion concerning the GM theorem’s underpinning assumptions.

A

we need certain assumptions about the manner in which Yi are generated. PRF shows that Yi depends on Xi and ui, therefore we must be specific about how Xi and ui are generated in order to make a statistical inferences about the true Yi and Beta. This is why following assumptions are critical

CLRM assumptions are as follows:

  1. SRF is correctly specified (correct variables, no measurement error, and correct functional form)
  2. The Specification is linear in parameters

Following assumptions are about the unobservable error term

  1. E(ut)=0, the errors have zero mean
    remember the graph
  2. Var(ut)=sigma squaredt pick up this pattern. These assumptions attempt that there is no pattern in our residuals.
    remember the graph
  3. Cov(ui,ut)=0 means no autocorrelation, given Xi the deviation of any two Y values from their mean value do not exhibit any pattern
  4. Cov(ui,Xi)=0 means disturbance and explanatory variables are uncorrelated,
  5. ui is normally distributed

the important question is how realistic are these assumptions?

The real world is messy which makes OLS is not always BLUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Proof of OLS estimators within CLRM framework, are the BLUEs.

A
Derivation of the estimates
proving Linearity 3.A2
proving Unbiasedness 3.A2
proving E(sigmahat^2)=sigmahat^2 deriving Var(ut)
deriving SE of OLS estimator
Show that these SE are the minimum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

OLS estimators derivations

A

use first derivative w.r.t alpha and beta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Discuss the statement “there is always a trade off between Type I and Type II errors when selecting a significance level

A

This is regarding the Hypothesis testing. There are always 2 hypothesis

Ho: statistical hypothesis being tested
Ha: remaining feasible outcomes other than stated in null hypothesis.

Ho is usually rejected when t stat is statistically significant at chosen level of significance.

                       reality
            Ho is true               Ho is false sig             Type I                       correct insig          correct                     Type II

Type I error is basically rejecting the true Null hypothesis, whereas Type II is failing to reject false Null hypothesis.

The probability of Type I error is just alpha, the significant level that we chose. for instance if we did 5% significant test, it means only 5% likely that the result as or more extreme as this could have occured by pure chance. if we reduce this alpha, we are reducing the probability of Type I error in our test. On the other hand we will reduce the probability of rejecting Null hypothesis at all, which means we increase the probability of Type II error.
So there is a trade off between Type I and Type II, if you reduce to more tighter test, you increase the probability of Type II, when Null is really false, then you fail to reject it.

however the only other way to reduce Type I and Type II is to increase the sample size, in other words reducing the magnitude of SE of the estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Discuss about Autocorrelation problem

A

Autocorrelation refers to a scenario when disturbance terms in PRF are correlated, which is violation of the CLRM assumption.
Cross sectional autocorrelation is known as spatial autocorrelation, whereas in time series it is known serial autocorrelation. This might be induced due to specification bias or incorrect functional form, or due to various transformation (interpolation).

Consequences: although the OLS estimators are unbiased, linear and consistent, no longer minimum variance among all linear unbiased estimators, when autocorrelation is present.

lets consider first order AR(1) process, that ui=rho*u(i-1)+ei

therefore var(beta hat)< VAR(beta hat AR(1))
in finance and economics rho tends to be positive therefore under AR(1) variance of beta hat is estimate is too high.

therefore OLS consequnce when AR(1) is present, the residual variance is likely to underestimate true variance, as a result we are likely overestimate R squared, and var of beta is likely underestimated hence SE will be incorrect, and usual test of significances are no longer valid.
However GLS is BLUE because we capitalise the additional information about rho.
var(beta hat OLS AR(1)) > var(beta hat GLS)
therefore GLS provides improved statistical power.

Detection: graphical inspection of residuals, whether they show any pattern, ( if so model didnt pick up this pattern) or formal tests can be done such as DW, Breusch Godfrey test.

DW d stat= sum(ut-(ut-1))^2/ sum (ut)^2

reject Ho/ no dec/ dont reject/ no dec/ reject
0——-dL——dU——–(4-dU)—-(4-dL)—4

where Ho: rho=0
Ha: rho 0 or rho not equal 0

Breusch Godfrey is superior test to DW (but requires large sample because it is an asymptotic test)
assume ut follows pth order AR process, that is
ut=rho1(ut-1)+…..rhop(ut-p)+et
Ho: rho1=rho2=…=rhop=0

run a following regression
ut=a1+a2Xt+rho1(ut-1)+.....rhop(ut-p)+et
and obtain R squared
if the sample size if large technically infinite Breusch Godfrey showed
(n-p)R^2 is approx Xp squared

Remedial Assuming autocorrelation is present
decide whether it is pure or due to misspecification
and you fixed everything and still have pure autocorrelation=>

We have 4 techniquies to: GLS, FGLS, EGLS, Newey West correction or simply continue OLS

when rho is known transform first observation on the Y and X

Prais Winsten transformation: Yt=Ytsquareroot(1-rho squared)
Xt=Xt
squareroot(1-rho squared)
this transformation is necessary because it make transformed data beta GLS ie minimum variance property.

when rho is unknown several methods applied to estimate is

  1. when sample is large, rho=1-d/2
  2. Estimating from residuals, assuming sample residuals are consistent estimator of real disturbances
  3. Cochrane Orcut procedure

MonteCarlo study show that weighted average of these rho estimates is superior than using one estimate alone

More general and superior method is HAC or Newy West Standard error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Discuss following errors of specifications:

  1. the ommision of relevant regressor
  2. the inclusion of superflous regressor
  3. an error of measurement in the regressor
  4. an error of measurement in the regressand
A

All of these errors cause spurious point estimate and hypothesis testing, ie invalid hypothesis

  1. omitting variable is referred as underfitting ( that means it has non zero coefficient when included in the regression), and its consequences as follows
    - retained coefficients are biased and inconsistent, and bias doesn’t dissapear as the sample size gets larger.
    - even above is not true ie no correlation with other regressors, then the intercept remains biased. therefore any forecast made is biased.
    - variance of disturbance term is estimated wrong, because it is a function of degrees of freedom ( ie number of variables accounted less therefore it is overestimated)
    - therefore SE of the coefficients are all biased upwards, because var of disturbance is overestimated.

Detection of this problem is usual diagnostic checks, such as t ratio, adjasted R squared, signs of the estimated variables, the DW statistic. if result doesn’t look encouraging such as low R squared, or very few coeffs are statistically significant or DW stat is too low etc indicates that model might have missed important variable or used wrong functional form or havent removed the serial correlation.

  1. Inclusion of superflous regressor is referred as overfitting unnecessary variable in the model
  • the OLS estimators of the incorrect model are unbiased and consistent, the coeff of the irrelevant variable is 0.
  • the disturbance variance is correctly estimated.
  • the usual hypothesis testing procedure remain valid,
  • the problem is estimated coeffs of incorrect model are inefficient, and therefore greater probabiity of making type II error in the hypothesis testing.
    Overfitting is basically Multicollinearity => d.o.f is reduced therefore inflates variance of disturbance => SE(beta hat) is inflated => less powerfull hypothesis testing. Hence the consequences are not as severe as underfitting.

detecion method is t-test for the suspected coefficient and can be further ascertained by F test

  1. Error of measurement in the regressor - in reality financial and economic data Xs and Ys are hardly available in accurate manner due to variety of reasons such as non response, computing, reporting etc.
    suppose Xireal is not observable and we estimate it
    Xi=Xireal+wi
    therefore SRF: error term is zi=ui-betawi
    let’s look at the cov(zi,Xi), this is equal -beta
    sigma squared, which is not equal to 0
    this violates the CLRM assumption of ui and Xi are uncorrelated. if this is the case then the OLS estimators are biased and inconsistent, which means they remain biased even the sample size increased indefinitely.
  2. Yi=Yireal+e

we can’t observe Yireal therefore we use the estimate however this inflates the error term variance = Var(ui+ei)
therefore although the errors of measurement in dependent variable still give unbiased estimates of the parameter, but their variances are larger than the case when we don’t have errors of measurement in the regressand

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Discuss about the approaches in statistical hypothesis testing. Compare and contrast these inter-related methodologies.

A

It is important to form the hypothesis prior to empirical investigation, otherwise result might be circular reasoning.
There are always 2 hypothesis formed
Ho: statistical hypothesis being tested
Ha: remaining feasible outcome other than Ho.
choosing level of significance is another crucial decision, in a essence trade of between commiting type I and type II error, classical statistics concentrate on type I error. It is common to choose from 1%,5%, 10% levels.
1. Derive the distribution under Ho
2. We have sample test statistic to compare, see if we can reject Ho
3. We see how likely is the observed statistic is under Ho.

three means of performing hypothesis test:
1. confidence interval
2. test of significance
3. p value
all three approaches investigating any support for Ho, given our sample statistic.( all should direct same conclusion)

Underlying confidence interval approach is the concept of interval estimation, where interval estimator is an interval constructed in such a manner that it has a specified probability of including within its limits the true value of unknown parameter. If the value specified by Ho lies in the confidence interval the Ho hypothesis is not rejected, otherwise Ho is rejected.

in the significance test procedure we develop test statistic and examine its sampling distribution under the Ho hypothesis. The test statistic usually follows well defined probability distribution such as t, F distribution. The reason why this distributions are considered is they take account of d.o.f and as the it increases to infinity it becomes normal distribution. We find the critical values given the level of significance and d.o.f, and determine the rejection and non rejection area.
Once a test statistic is computed from the data at hand, it is compared against the critical value, ie whether it lies in rejection or non rejection zone.

the third approach is p value, p value is the significant level that makes indifferent between rejecting and failing to reject Ho. this value is easily calculated once the test stat is computed. this p value then compared to the test level of significance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Discuss about the maximum likelihood estimator.

A

MLE is a standard method of fitting the parameters of a density function, this is a method of point estimation with stronger theoritical properties than the method of OLS. If the ut are normally distributed, then ML and OLS estimators of the regression coefficient, beta hats are identical.
the ML estimator of Var(ut) is sum of ut squared over n. this is biased as OLS estimates it sum of ut squared over (n-2) is unbiased. However as the sample size n gets larger two estimators of sigma squared tend to be equal. hence no loss by using OLS with additional normality assumption, rather than ML with involved mathematical complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Discuss the implication of heteroscedasticity in OLS estimation methodology.

A

Heteroscedasticity issue is a violation of CLRM assumption of homoscedastic disturbance term.

  1. it doesnt destroy the unbiasedness and consistency propoerties of OLS estimators.
  2. but the variance is not efficient, they are no longer BLUE
  3. The BLUE estimators can be found by method of weighted least squares, provided the heteroscedastic error variance is known, sigmai squared
  4. in the presence of hetero the variance of OLS estimators are not provided by usual OLS formulas. But if we use them the usual hypothesis testing result invalid conclusion.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Discuss the rationale for Hausman test for simultaneity

A

Simultaneous equation models = they have more than 1 endogenous variable, which means that endogenous variable in one model appear as exogenous variable in another equation of system. As a consequences such endogeouns explanatory variable becomes stochastic and ususally correlated with the distrurbance term of the equation where it appears as an explanatory variable. This is a violation of CLRM and therefore OLS estimators are not BLUE, and not consistent, ie regarldess of sample size they remain biased.

This leads to the idea of testing simultaneity, which is essentially a test of whether (an endogenous) regressor is correlated with the error term. If it is then simultaneity problem exists, so therefore alternative methods to OLS must be utilised, such as 2SLS, Instrumental Variable. Oddly if these alternative methods are applied when there is in fact no simultaneity problem, the estimators are not efficient, hence the Hausman test is done prior to adapting any estimating techniques.

Hausman test:
demand f: Q=a+bP+cI+dR+u1
supply f: Q=e+kP+u2

assume I and R are exogenous, of course P and Q are endogenous (P depend on Q and Q depend P)
if they were mutually independent ie no simultaneity, the P in the supply function should be uncorrelated with u2.

step 1: regress P on I and R obtin the residual v
so P=Phat + vhat
step2: replace it on supply function regress Q on Phat and vhat and residual here is u2.

Q=e+kPhat+kvhat+u2

Under Null hypothesis that there is no simultaneity means vhat and u2 should have no correlation. Thus we assess the coefficient of vhat, if it insignificant we conclude that there is no simultaneity problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Consequenses of multicollinearity and at least trhee Methods of detecting multicollinearity,
remedial measures to counteract the implications of multicollinearity

A

MC refers to a situation where there is either an exact or an approximately exact among the X variables, which is a violation of CLRM.

Consequences are as follows

  1. if there is a perfect collinearity among Xs, then their regression coefficients are indeterminate and their SE are not defined.
  2. if collinearity is high but not perfect estimation of regression coefficients is possible but their SE are tend to be too large.
  3. as a result population values of the coeffs cannot be estimated precisely. However if the objective is to estimate linear combination of these coefficients, the estimable functions, this can be done even in the presence of perfect MC.

Detecting MC
although there are not particular method for detecting MC, there are several indicators of it.

  1. R squared very high, and many inisginificant t ratios
  2. just examining the correlation coeffs for 2 X variables ( the case is only for 2 explanatory variable model) . if value is high then MC is present.
  3. examining partial correlation coefficients
  4. R squared high but part correlation is also high then MC may not be readily detectable.
  5. Regressing each of the X variable on remaining explanatory variables and obtain R squared values. If high value found, this suggest that particular variable is highly correlated with the rest of Xs.

Remedies

There are no sure method for eliminating this problem, however few rules are outlined below

  1. using extraneous or priori information
  2. combining cross sectional and time series data
  3. omitting highly collinear variable
  4. transforming data
  5. obtaining additional or new data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Compare and contrast simple and continously compounded stock market returns

A

Data manipulation Prive vs Return

unit root test and
conintegration test => suggest to use return, because we’d like to use GM theorem and one of the assumption requires stable variable.

the cost of switching from price to return is long run ……listen the record.

the reason we use log is they are additive however disadvantage is they are not summative cross sectionally. Whereas arithmetic returns are cross sectionally summative, therefore useful for handling portfolio return.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Data mining error with regards to Specification

A

Data mining refers to a selection of candidate regressors from a larger group of potential regressor, without theoritical justification. Therefore resulting relation of Y and Xs is probably false.

true level of significance alpha=1-(1-alpha)^(candidate number/selected number)
hence alpha
is hugely different than alpha used so stating conclusion in regard to (1-alpha)% confidence is spurious.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Discuss several methods to address the dilemma of autocorrelation in respect to OLS estimation

A

In the presence of Autocorrelation, the OLS estimators although unbiased, consistent and asymptotically normally distributed, are not efficient. therefore the usual inferences procedure based on t, F, Chi tests is no longer appropriate. On the other hand feasible GLS and HAC(hetero and auto consistent) produce estimators that are efficient, but the finite or small sample, properties of these estimators are not well documeted. This means in a small sample FGLS and HAC might actually do worse than OLS. Griliches and Rao found that if the sample is relatively small and the coeff of autocorrelation rho is less than 0.3 then OLS is as good or better than FGLS.
This might suggest when sample is small and rho is less than 0.3 then OLS is preferably used. But the size of the sample is relative question, and practical judgement is necessary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Test for exogeneity

A

Eventough it is researcher’s responsibility to specify exogenous and endogenous variables in the model, what if we dont know it.Hausman test can be utilised to deal with this issue.
Suppose we have three equation model in three endogenous variables Y1,Y2, and Y3, and suppose there are three exogenous variables X1,X2, and X3. the first equation is

Y1=a+a2Y2+a3Y3+bX1+u1

step 1: regress reduced form equation on Y2 and Y3 and obtain corresponding predicted values.

step 2:
Y1=a+a2Y2+a3Y3+bX1+g2Y2hat + g3Y3hat+u1

Ho: g2=g3=0
if this is rejected Y2 and Y3 can be deemed as endogenous variables.

17
Q

Describe the concept of identification and simlutaneous equation bias

A
  1. The problem of identification precedes the problem of estimation. because the estimate is biased
  2. The identification problem asks whether one can obtain unique numerical estimates of the structural coefficients from the estimated reduced form coefficients.
  3. If this can be done, an equation in a system of simultaenous equations is identified.
  4. An identified equation can be just identified or overidentified. Itentified means unique values of structural coefficients can be obtained. Overidentified means there may be more than one value for one or more structural parameters.
  5. The identification problem arises because of same set of data may be compatible with different sets of structura; coefficients, that is different models. Therefore in the regression of price on quantity only, it is difficult to tell whether one is estimating supply or demand, because price and quantity enter both equations.
18
Q

Explain three methodologies of non linear least square estimation

A

Direcr search

this method doesnt require the use of calculus methods, if the NLRM involves several parameters, the method becomes cumbersome and computaionaly expensive. For example if you have 5 parameters and 25 alternative values, error sum squares has to be computed 25^5=over 9.7 million times. secondly there is no guarantee that the final set of parameter values you have selected will necessarily give you the absolute minimum error sum of squares.

Direct optimization
Basically least square estimator derivation method, if it can’t be solved analytically following method is used

Iterative linearization method.
Linearize the equation around some initial values of parameter