Quiz 1 Flashcards by Gabe Murchison

Linear model assumptions

(1) Linear in parameters
(2) Random sampling
(3) No perfect collinearity
(4) Zero conditional mean assumption: The mean of the population error is zero for all values of x
(5) Homoscedasticity: The variance of the population error is the same for all values of x
(6) Population error is independent of predictor variables and normally distributed with mean 0

How well did you know this?

Not at all

Perfectly

What if assumption 6 does not hold?

This is OK in large sample sizes due to the Central Limit Theorem

How well did you know this?

Not at all

Perfectly

What if assumption 5 does not hold?

Can use weighted least squares or robust methods, or try transformations of Y

How well did you know this?

Not at all

Perfectly

How to identify homoscedasticity

Plot residuals against predicted or predictor values, or use the Breusch-Pagan test.

How well did you know this?

Not at all

Perfectly

Total sum of squares

sum(y_i - y bar)^2

How well did you know this?

Not at all

Perfectly

Sum of squares explained

sum(y hat - y bar)^2

How well did you know this?

Not at all

Perfectly

Sum of squares residuals

sum(y_i - y hat)^2

How well did you know this?

Not at all

Perfectly

R^2 (formula and interpretation)

R^2 = SSE/SST and is the coefficient of determination, representing the proportion of variance explained by the model

How well did you know this?

Not at all

Perfectly

Goodness of fit evaluation for logistic regression

Compare observed vs expected outcomes for each covariate pattern and use Pearson’s X^2 test. If there are many covariate patterns, use deciles of risk (Hosmer-Lemenshow). You can think of the test statistic as a sum of square residuals. Also, you can plot the residuals vs observations to identify outliers.

How well did you know this?

Not at all

Perfectly

Concerns with Hosmer-Lemenshow technique

Don’t choose G too small, need large data sets, and with very large data sets you may get a large C despite good fit (so check the table)

How well did you know this?

Not at all

Perfectly

What do we need to consider about B_0 hat and B_1 hat?

They may not be independent; may need to consider their covariance

How well did you know this?

Not at all

Perfectly

What three properties should estimators have?

Unbiased, meaning the expected value is equal to the population value for any population value
Consistent, meaning they converge in probability to the population value as sample size grows without bound
Efficient, meaning it has the lowest variance of all possible estimators

How well did you know this?

Not at all

Perfectly

Method of moments: general concept

Replace expected values with their sample means

How well did you know this?

Not at all

Perfectly

Gauss-Markov Theorem

If assumptions 1-5 hold, the OLS estimator is BLUE

How well did you know this?

Not at all

Perfectly

Which assumptions are necessary for inference using the OLS estimator?

Requires assumptions 1-6

How well did you know this?

Not at all

Perfectly

What is the method of moments estimator for the variabce, and why?

We use (1/(n-p-1) sum(eta_i hat)^2 because otherwise there will be bias due to the degrees of freedom issue.

How well did you know this?

Not at all

Perfectly

What can you do if the zero conditional mean assumption does not hold?

Nothing. You are screwed

How well did you know this?

Not at all

Perfectly

What do you need to do if testing combinations of the beta hats?

Study These Flashcards

Need to account for covariance between the beta hats

When can you use a robust method of inference, and what is the general idea?

Study These Flashcards

In the presence of homoscedasticity, and when the sample size is sufficiently large. This method estimates the variance of beta hat based on the fitted residuals, resulting in downweighting of any outliers.

When can you use weighted least squares, and what is the general idea?

Study These Flashcards

In the presence of homoscedasticity. Weights observations by the reciprocal of their variance. Can develop weights by assuming a linear relationship between x and variance, or using regression.

What increases the variance of a beta estimate?

Study These Flashcards

Multicollinearity: correlation between predictors in the model
X values concentrated together
Variance of the outcome variable

What happens to a beta estimate when you omit a variable?

Study These Flashcards

The initial beta estimate becomes equal to the estimate for that variable, plus the estimate for the other variable times the correlation between the two variables.

Adjusted R^2: purpose and formula

Study These Flashcards

Way to compare models with different number of covariates. Not so precise. R^2_adj = 1-(SSR/(n-p-1))/(SST/(n-1))

Residual mean squared: purpose and formula

Study These Flashcards

Deciding how many covariates to put in the model. Graphing the average residual mean squared for various models with each possible number of covariates, looking for a plateau. (Sometimes the model with the smallest residual mean squared might not have that number of covariates, however.)

Model comparison options

``` Adjusted R^2 Residual mean squared Mallow's Cp AIC and corrected AIC Cross-validation ```

Mallow's Cp: purpose and formula

Similar to residual mean squared technique. Specify a maximal model and compare it to a nodel with p parameters, using the formula Cp = SSRp/(theta hat)^2_max - (n-2p). If p parameters is enough, then E(Cp) = p. Can graph Cp vs number of covariates.

Leave one out cross-validation: purpose and formula.

Determine if model is overfitted by using it for predictions. Sum the predicted residual sum of squares for each model, and the model with the lowest PRESS is best. May not be feasible in very large data sets.

Types of linear transformations

Scaling, centering, standardization

Disadvantages of categorization of variables

Not sure if the categories are "right", eats df, can only easily compare to the single reference group, may have residual confounding

Possible nonlinear transformations

log, polynomial

Disadvantages of polynomial approach

May not fit well at the extremes; not sensitive to local nonlinearities; can be affected by outliers

Idea behind penalized spline regression

Instead of choosing number of knots, keep in all the knots and restrict sum of squared betas for spline part be less than or equal to a constant C. This penalizes roughness (can adjust based on smoothing parameter lambda, which is = 0 using all knots).

Goal of quadratic splines

First derivatives equal at knots; gives smoother fits

Ridge and lasso regression concepts

Ridge: Constrains coefficient sums of squares to fixed value Lasso: Same as ridge but can reduce some coefficients to zero

How do you get the variance in maximum likelihood estimation

In large samples, use the observed information matrix (parameter estimate variance is asymptotically normal)

Inferences in logistic regression

Wald-based inferences based on z statistic -- the beta estimator is asymptotically normal

How many fisher scoring iterations should you have

A small number

How do you get the variance of a beta coefficient estimate in logistic regression

Look at the observed information matrix in the (k,k)th element

Likelihood ratio test: goal and formula

G=-2log(likelihood nested model/likelihood larger model) ~ X^2 _q-p. Tests whether parameter(s) are statistically significant.

Deviance: formula, simpler formula, and corollary in linear reg

-2log(likelihood fitted model/likelihood saturated model). If y values are 0/1, simplifies to -2log(likelihood fitted model). Equivalent to SSR in linear regression.

AIC formula for logistic regression

AIC = -2log(likelihood fitted) + 2(p+1), or D+2(p+1) for 0/1 models

Goal of exact logistic regression

Allows inferences for logistic regression in small sample sizes

Probit regression

Similar to logistic regression but with a link function using the cumulative normal distribution

Conditional logistic regression

Used in case-control/paired studies because number of parameters can get very large. Instead, we treat each pair as a unit of observation, knowing that each pair has a Yi=1 and a Yi=0.

Quiz 1 Flashcards

(44 cards)