Quiz 1 Flashcards
Linear model assumptions
(1) Linear in parameters
(2) Random sampling
(3) No perfect collinearity
(4) Zero conditional mean assumption: The mean of the population error is zero for all values of x
(5) Homoscedasticity: The variance of the population error is the same for all values of x
(6) Population error is independent of predictor variables and normally distributed with mean 0
What if assumption 6 does not hold?
This is OK in large sample sizes due to the Central Limit Theorem
What if assumption 5 does not hold?
Can use weighted least squares or robust methods, or try transformations of Y
How to identify homoscedasticity
Plot residuals against predicted or predictor values, or use the Breusch-Pagan test.
Total sum of squares
sum(y_i - y bar)^2
Sum of squares explained
sum(y hat - y bar)^2
Sum of squares residuals
sum(y_i - y hat)^2
R^2 (formula and interpretation)
R^2 = SSE/SST and is the coefficient of determination, representing the proportion of variance explained by the model
Goodness of fit evaluation for logistic regression
Compare observed vs expected outcomes for each covariate pattern and use Pearson’s X^2 test. If there are many covariate patterns, use deciles of risk (Hosmer-Lemenshow). You can think of the test statistic as a sum of square residuals. Also, you can plot the residuals vs observations to identify outliers.
Concerns with Hosmer-Lemenshow technique
Don’t choose G too small, need large data sets, and with very large data sets you may get a large C despite good fit (so check the table)
What do we need to consider about B_0 hat and B_1 hat?
They may not be independent; may need to consider their covariance
What three properties should estimators have?
Unbiased, meaning the expected value is equal to the population value for any population value
Consistent, meaning they converge in probability to the population value as sample size grows without bound
Efficient, meaning it has the lowest variance of all possible estimators
Method of moments: general concept
Replace expected values with their sample means
Gauss-Markov Theorem
If assumptions 1-5 hold, the OLS estimator is BLUE
Which assumptions are necessary for inference using the OLS estimator?
Requires assumptions 1-6
What is the method of moments estimator for the variabce, and why?
We use (1/(n-p-1) sum(eta_i hat)^2 because otherwise there will be bias due to the degrees of freedom issue.
What can you do if the zero conditional mean assumption does not hold?
Nothing. You are screwed