Quiz 1 Flashcards
Linear model assumptions
(1) Linear in parameters
(2) Random sampling
(3) No perfect collinearity
(4) Zero conditional mean assumption: The mean of the population error is zero for all values of x
(5) Homoscedasticity: The variance of the population error is the same for all values of x
(6) Population error is independent of predictor variables and normally distributed with mean 0
What if assumption 6 does not hold?
This is OK in large sample sizes due to the Central Limit Theorem
What if assumption 5 does not hold?
Can use weighted least squares or robust methods, or try transformations of Y
How to identify homoscedasticity
Plot residuals against predicted or predictor values, or use the Breusch-Pagan test.
Total sum of squares
sum(y_i - y bar)^2
Sum of squares explained
sum(y hat - y bar)^2
Sum of squares residuals
sum(y_i - y hat)^2
R^2 (formula and interpretation)
R^2 = SSE/SST and is the coefficient of determination, representing the proportion of variance explained by the model
Goodness of fit evaluation for logistic regression
Compare observed vs expected outcomes for each covariate pattern and use Pearson’s X^2 test. If there are many covariate patterns, use deciles of risk (Hosmer-Lemenshow). You can think of the test statistic as a sum of square residuals. Also, you can plot the residuals vs observations to identify outliers.
Concerns with Hosmer-Lemenshow technique
Don’t choose G too small, need large data sets, and with very large data sets you may get a large C despite good fit (so check the table)
What do we need to consider about B_0 hat and B_1 hat?
They may not be independent; may need to consider their covariance
What three properties should estimators have?
Unbiased, meaning the expected value is equal to the population value for any population value
Consistent, meaning they converge in probability to the population value as sample size grows without bound
Efficient, meaning it has the lowest variance of all possible estimators
Method of moments: general concept
Replace expected values with their sample means
Gauss-Markov Theorem
If assumptions 1-5 hold, the OLS estimator is BLUE
Which assumptions are necessary for inference using the OLS estimator?
Requires assumptions 1-6
What is the method of moments estimator for the variabce, and why?
We use (1/(n-p-1) sum(eta_i hat)^2 because otherwise there will be bias due to the degrees of freedom issue.
What can you do if the zero conditional mean assumption does not hold?
Nothing. You are screwed
What do you need to do if testing combinations of the beta hats?
Need to account for covariance between the beta hats
When can you use a robust method of inference, and what is the general idea?
In the presence of homoscedasticity, and when the sample size is sufficiently large. This method estimates the variance of beta hat based on the fitted residuals, resulting in downweighting of any outliers.
When can you use weighted least squares, and what is the general idea?
In the presence of homoscedasticity. Weights observations by the reciprocal of their variance. Can develop weights by assuming a linear relationship between x and variance, or using regression.
What increases the variance of a beta estimate?
Multicollinearity: correlation between predictors in the model
X values concentrated together
Variance of the outcome variable
What happens to a beta estimate when you omit a variable?
The initial beta estimate becomes equal to the estimate for that variable, plus the estimate for the other variable times the correlation between the two variables.
Adjusted R^2: purpose and formula
Way to compare models with different number of covariates. Not so precise. R^2_adj = 1-(SSR/(n-p-1))/(SST/(n-1))
Residual mean squared: purpose and formula
Deciding how many covariates to put in the model. Graphing the average residual mean squared for various models with each possible number of covariates, looking for a plateau. (Sometimes the model with the smallest residual mean squared might not have that number of covariates, however.)
Model comparison options
Adjusted R^2 Residual mean squared Mallow's Cp AIC and corrected AIC Cross-validation
Mallow’s Cp: purpose and formula
Similar to residual mean squared technique. Specify a maximal model and compare it to a nodel with p parameters, using the formula Cp = SSRp/(theta hat)^2_max - (n-2p). If p parameters is enough, then E(Cp) = p. Can graph Cp vs number of covariates.
Leave one out cross-validation: purpose and formula.
Determine if model is overfitted by using it for predictions. Sum the predicted residual sum of squares for each model, and the model with the lowest PRESS is best. May not be feasible in very large data sets.
Types of linear transformations
Scaling, centering, standardization
Disadvantages of categorization of variables
Not sure if the categories are “right”, eats df, can only easily compare to the single reference group, may have residual confounding
Possible nonlinear transformations
log, polynomial
Disadvantages of polynomial approach
May not fit well at the extremes; not sensitive to local nonlinearities; can be affected by outliers
Idea behind penalized spline regression
Instead of choosing number of knots, keep in all the knots and restrict sum of squared betas for spline part be less than or equal to a constant C. This penalizes roughness (can adjust based on smoothing parameter lambda, which is = 0 using all knots).
Goal of quadratic splines
First derivatives equal at knots; gives smoother fits
Ridge and lasso regression concepts
Ridge: Constrains coefficient sums of squares to fixed value
Lasso: Same as ridge but can reduce some coefficients to zero
How do you get the variance in maximum likelihood estimation
In large samples, use the observed information matrix (parameter estimate variance is asymptotically normal)
Inferences in logistic regression
Wald-based inferences based on z statistic – the beta estimator is asymptotically normal
How many fisher scoring iterations should you have
A small number
How do you get the variance of a beta coefficient estimate in logistic regression
Look at the observed information matrix in the (k,k)th element
Likelihood ratio test: goal and formula
G=-2log(likelihood nested model/likelihood larger model) ~ X^2 _q-p. Tests whether parameter(s) are statistically significant.
Deviance: formula, simpler formula, and corollary in linear reg
-2log(likelihood fitted model/likelihood saturated model). If y values are 0/1, simplifies to -2log(likelihood fitted model). Equivalent to SSR in linear regression.
AIC formula for logistic regression
AIC = -2log(likelihood fitted) + 2(p+1), or D+2(p+1) for 0/1 models
Goal of exact logistic regression
Allows inferences for logistic regression in small sample sizes
Probit regression
Similar to logistic regression but with a link function using the cumulative normal distribution
Conditional logistic regression
Used in case-control/paired studies because number of parameters can get very large. Instead, we treat each pair as a unit of observation, knowing that each pair has a Yi=1 and a Yi=0.