Chapter 2 Flashcards
Formula dependent/explained/response variable y
Assumption 1 u
E(u) = 0
we normalize unobserved factors to have on average a value of 0 in the population
How should x be related to u (Assumption 1)
- correlation coefficient: If x and u are uncorrelated then they are not linear
- U is mean independent of x
Assumption 2
- conditional mean independence assumption
- E(u∣X)=0
To what can the violations of the conditiona mean independence assumption lead?
to biased parameter estimates and inefficient hypothesis tests in regression analysis
What does the explanatory variable not contain (Assumption 2)?
information about the mean of the unobserved factors
Populaton regression function: Formula/calculation
How can the average value of dependent variable be expressed as (PFR)?
linear function of independent value
How does one unit increase in x change the average value of y (PFR)?
by beta 1
Describe OLS
- fits a linear line onto the data
- estimates the parameters in such a way that the sum of the squared values of the residuals is minimized
Definition: Residual
actual value y minus the predicted value y where predicted value is based on the model aprameters
Formula: Residual
Formula: RSS
Formula: fitted or predicted values
Formula: Deviations from regression line (=residuals)
What does the average of the residuals/deviations from regression equal to?
zero
What does the covariance between residuals and regression equal to and what does it imply?
OLS estimates are chosen to make the residuals add up to what, for any data set?
zero
Poperty 1 of OLS means:
the average of residuals is zero
the sample average of the fitted values is the same as the sample average of the yi
Formula: First alegabraic property of OLS regression
Formula: second algebraic property of OLS regression
Formula: third algebraic property of OLS regression
What is SST?
a measure of the total sample variation in the yi that measures how spread out are the yi in the sample
What does dividing SST by n-1 give us?
the sample variance of y
Formula: total sum of squares
Formula: Explained sum of squares
Formula: Residual sum of squares
Decomposition of total variation
Goodness-of-fit measure (R-squared)
Value of Goodness-of-fit measure (R-squared)
between 1 and 0
OLS: What happens if data points all lie on the same line?
OLS perfect fit
R squared = 1
What happens if R squared is close to zero?
- poor fit of OLS line
- very little of the variation in the captured yi is capture by the variation in the ^yi, which all lie on the OLS regression line
True or False: High R squared means that regression has causal interpretation
False
Why are estimated regression coefficients random variables?
because they are calculated from a random sample
What are the assumptions for SLR?
- Linear in parameters
- Random sampling
- Sample variation in the explanatory variable
- Zero conditional mean
- Homoskedasticity
Describe Assumption 1 of SLR
Describe Assumption 2 of SLR
Describe Assumption 3 of SLR
Describe Assumption 4 of SLR
What is a very weak SLR Assumption and why?
Assumption 3
- there is varion in xi
What is a very strong SLR Assumption and why?
Assumption 4
- condition on x i, E(u i) = 0
SLR 2 + SLR 4:
Fixed in repeated samples: why are they not always very realistic?
- one does not choose values of education and then searches for individuals with those values
How can we treat xi if we assume SLR and SLR 2?
as nonrandom
True or False: SLR1-SLR4, OLS estimators are unbiased
True
Interpretation of unbiasedness of OLS: what is crucial?
Assumptions SLR1-SLR4
Interpretation of unbiasedness of OLS: how can the estimated coefficients be?
smaller or larger, depending on the sample that is the result of a random draw
- on average equal to the values that characterize the true relationship between y and x in the population
What does “on average” mean in the context of SLR unbiasedness?
= if sampling was repeated
ie if drawing the random sample and doing the estimation was repeated many times
True or False: in a given sample, estimates may differ considerably from true values (SLR unbiasedness)
True
Formula: SLR5
What role does SLR5 play in showing the unbiasedness?
- it plays no role
- it simplifies the variance calculations
What is sigma squared (SLR5)?
- the unconditional variance of u
- the error variance
Formula: sigma squared (SLR5)
Formula: Summarizing SLR4 and SLR5
What does Homoskedasticity mean?
the variance of the errors is constant across all elvels of the independent value(s)
What happens when Homoskedasticity is satisfied?
- OLS estimators are unbiased and efficient
- hypothesis tests and confidence intervals are valid
What does Heteroskedasticity mean?
the variance of the errors is nto constant across all elvels of the independent variable(s)
How are the OLS estimators in the presence of Heteroskedasticity?
- still unbiased
- no longer efficient
- leading to inefficient standard errors
What are the methods of testing homoskedasticity in Sata?
- Visual Inspection
- Breusch-Pagan test
- White test
Describe Visual inspection (method of testing for homoskedasticity in Sata)
- predic residuals (r, res)
- plot your residuals against your independent variables (scatter r x)
- in case of a multivariate regression predict the fitted values (yhat, xb) and plot the residuals against the fitted values
Describe the Breusch-Pagan test (method of testing for homoskedasticity in Sata)
- we test whether the estimated variance of the residuals are dependent on the values of the independent variables
- run the regression and type “estat hettest” directly after the reg command
Describe the white test (method of testing for homoskedasticity in Sata)
- similar to the BP test
- allows the independent variables to have a nonlinear effect on the error variance
- run the regression and type “imtest, white” directly after the reg command
Under SLR1-SLR5 we obtain a variance of OLS estimators (formula and explanations)
Problem: the error variance is unknown. Why?
- we do not know what error variance is because we do not observe the erros, ui
- what we observe are the residuals
- luckily we can use these residuals to form an estimate of the error variance
Theorem 2.3 (Unbiasedness of error variance): Formula & explanation
Compare SE for beta and mean: Formula & Explanation
True of False: another OLS assumption is that the error terms or even the dependent or independent variables are normally distributed
False
- OLS only requires errors to be i.i.d., but normality is required neither for unbiased and efficient OLS estimates nor for the calculation of standard errors
What is necessary for convenient hypohesis testing?
a normal distribution
When the errors are normally distributed… ?
- the test statistic follows a t-distribution
- we can use familiar cut-off values