Lecture 3 Flashcards
We now want to test hypotheses about the Bjs, why and what?
So first of course we estimated the value, now we need to use the data to determine whether or not the hypothesis is likely to be false or not
- h0: B1 = 0
But for hypothesis testing we need to know the full distribution too
What do we know about the distribution of u?
- why is this even relevant?
MLR.4 - E[u|xi] = 0
MLR.5 - Var[u|xi] = o^2
- the estimator is expressed as the true parameter + a sum of error terms ui weighted by the coefficients wij, which depend on the sample Xn, showing the variability in Bj estimator is driven by the error term
- therefore, since Bj estimator depends on ui, its distribution inherits characteristics from the distribution of the errors ui, which are random
MLR.6
Assumption of Normality, the population error u is independent of xi, and is normally distributed with mean 0 and variance o^2
- u - N(0,o^2)
- MLR.6 implies MLR.4 and MLR.5 - a much stronger assumption
- we have now made a very specific distributional assumption for u - the familiar bell shaped curve
Is normality a reasonable assumption?
- how does it tie in with the CLT?
CLT suggests that as each error term is the sum of many small independent factors, if each factor follows a similar distribution and is independent, CLT suggests u will tend to be normally distributed
- assumption may be violated in applications, but maintained for convenience of statistical inference
Normal sampling distributions under MLR.1-MLR.6
Bj^ - N(Bj, Var(Bj^|Xn))
(Bj^-Bj)/sd(Bj^|Xn) - N(0,1)
Standardised random variable has 0 mean and variance 1 under MLR.1-4, and now under 6 it is normally distributed.
- result holds regardless of Xn
Can we directly use the result
(Bj^ - Bj)/(sd(Bj^|Xn) - N(0,1)
No as the denominator depends on o = sd(u), which is unknown, but we can use o^ as an estimator of o
- using this in place of o gives us the standard error se(Bj^)
TBj = (Bj^-Bj)/(se(Bj^))
T distribution allows us to perform hypothesis tests even without knowing the true variance o
Replacing o with o^ takes us from the standard normal to the _
T distribution
- also bell shaped but more spread out than the N(0,1)
- since we are adding a new estimator, which varies across samples, it introduces additional variability in the estimation, shifting us from N to a distribution which accounts for this sample variability
- bell shaped, has heavier tails though - to account for variability
- as df = n - k - 1 > 120, t distribution approaches the normal distribution
So what does the t statistic tend to look like in practise?
As h0: bj = 0
Tbj^ = (Bj^)/se(Bj^)
Standard approach to hypothesis testing
- Choose a null hypothesis
- Choose an alternative hypothesis
- Find a good test statistic for testing H0 against H1
- Choose a significance level for the test
- Choose a critical value, so the rejection rule t>c implies we make T1 errors at the close sig level:
Pr(t>c|H0 true) = sig level
To reduce the probability of a T1 error we must
Increase the critical value, usually meaning lowering the significance level
What do we do for a 2 tailed test?
Rejection rule is :
|t|>c
What is a p value?
- high or low means what?
- what do we do when it is two tailed?
Smallest significance level at which we can still reject H0
I.e., tells us the probability of obtaining a test statistic as extreme as or more extreme than the observed value, assuming H0 is true
- Lower p value is better, a low probability of seeing such an extreme result if H0 were true
- if two tailed, then multiply by 2
What’s the use of confidence intervals?
Confidence intervals give a range of values within which the true population parameter is likely to fall, offering a way to understand the precision of an estimation beyond simple hypothesis testing
Sometimes we want to test more than one restriction/ hypothesis, which then includes multiple parameters, what do we do?
We use a new statistic to test these joint hypotheses
- can’t rely on individual t statistics for each parameter as each tests a hypothesis about a single parameter in isolation, which doesn’t account for the joint impact of multiple parameters together
Use F statistics
Alternative to joint null hypothesis e.g.
H0: B3 = B4 = 0 at 5 %
H1: H0 is not true
Why dont t stats work for MERs?
If we want to test a joint hypothesis, we need to avoid relying on individual tests only, as each test has a certain error rate, and combining them without proper control leads to an overall error rate which is too high
- not accounting for size control
What is size control?
Refers to the probability of making a T1 error in hypothesis testing, each test carries its own chance of T1 error, sos combined error rate can exceed the 5%
E.g. probability that each test doesn’t reject null could be 95%, so 0.95x0.95 = 0.9025, therefore a 9,75% chance that at lest one of these tests would reject the null even if both parameters are truly 0, so prob of rejecting null would exceed 5%
How does F statistic work with different model fits
- unrestricted vs restricted?
- SSR?
Unrestricted:
- lsalary = B0 + B1years + B2… + u
Restricted:
- lsalary = B0 + B1years + B2gamesyr + u
Our test statistic will compare the fit of the restricted and unrestricted model
- its an algebraic fact that the SSR must increase when xs are dropped so SSR>/ SSRur, so does SSR increase by enough to conclude the restrictions under H0 are false?
F statistic
F = ((SSRr - SSRur)/(dFr - dFur))/(SSRur/dFur)
= ((SSRr - SSRur)/(q))/((SSRur)/(n-k-1))
If F>C, then reject h0
Whats the p value in f tests?
P value = Pr( F> Fobs| H0 is true)
- stata automatically reports the p value with each test