Quiz 1 Flashcards

1
Q

Linear model assumptions

A

(1) Linear in parameters
(2) Random sampling
(3) No perfect collinearity
(4) Zero conditional mean assumption: The mean of the population error is zero for all values of x
(5) Homoscedasticity: The variance of the population error is the same for all values of x
(6) Population error is independent of predictor variables and normally distributed with mean 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What if assumption 6 does not hold?

A

This is OK in large sample sizes due to the Central Limit Theorem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What if assumption 5 does not hold?

A

Can use weighted least squares or robust methods, or try transformations of Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to identify homoscedasticity

A

Plot residuals against predicted or predictor values, or use the Breusch-Pagan test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Total sum of squares

A

sum(y_i - y bar)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sum of squares explained

A

sum(y hat - y bar)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sum of squares residuals

A

sum(y_i - y hat)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

R^2 (formula and interpretation)

A

R^2 = SSE/SST and is the coefficient of determination, representing the proportion of variance explained by the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Goodness of fit evaluation for logistic regression

A

Compare observed vs expected outcomes for each covariate pattern and use Pearson’s X^2 test. If there are many covariate patterns, use deciles of risk (Hosmer-Lemenshow). You can think of the test statistic as a sum of square residuals. Also, you can plot the residuals vs observations to identify outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Concerns with Hosmer-Lemenshow technique

A

Don’t choose G too small, need large data sets, and with very large data sets you may get a large C despite good fit (so check the table)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do we need to consider about B_0 hat and B_1 hat?

A

They may not be independent; may need to consider their covariance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What three properties should estimators have?

A

Unbiased, meaning the expected value is equal to the population value for any population value
Consistent, meaning they converge in probability to the population value as sample size grows without bound
Efficient, meaning it has the lowest variance of all possible estimators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Method of moments: general concept

A

Replace expected values with their sample means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Gauss-Markov Theorem

A

If assumptions 1-5 hold, the OLS estimator is BLUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which assumptions are necessary for inference using the OLS estimator?

A

Requires assumptions 1-6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the method of moments estimator for the variabce, and why?

A

We use (1/(n-p-1) sum(eta_i hat)^2 because otherwise there will be bias due to the degrees of freedom issue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What can you do if the zero conditional mean assumption does not hold?

A

Nothing. You are screwed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What do you need to do if testing combinations of the beta hats?

A

Need to account for covariance between the beta hats

19
Q

When can you use a robust method of inference, and what is the general idea?

A

In the presence of homoscedasticity, and when the sample size is sufficiently large. This method estimates the variance of beta hat based on the fitted residuals, resulting in downweighting of any outliers.

20
Q

When can you use weighted least squares, and what is the general idea?

A

In the presence of homoscedasticity. Weights observations by the reciprocal of their variance. Can develop weights by assuming a linear relationship between x and variance, or using regression.

21
Q

What increases the variance of a beta estimate?

A

Multicollinearity: correlation between predictors in the model
X values concentrated together
Variance of the outcome variable

22
Q

What happens to a beta estimate when you omit a variable?

A

The initial beta estimate becomes equal to the estimate for that variable, plus the estimate for the other variable times the correlation between the two variables.

23
Q

Adjusted R^2: purpose and formula

A

Way to compare models with different number of covariates. Not so precise. R^2_adj = 1-(SSR/(n-p-1))/(SST/(n-1))

24
Q

Residual mean squared: purpose and formula

A

Deciding how many covariates to put in the model. Graphing the average residual mean squared for various models with each possible number of covariates, looking for a plateau. (Sometimes the model with the smallest residual mean squared might not have that number of covariates, however.)

25
Q

Model comparison options

A
Adjusted R^2
Residual mean squared
Mallow's Cp
AIC and corrected AIC
Cross-validation
26
Q

Mallow’s Cp: purpose and formula

A

Similar to residual mean squared technique. Specify a maximal model and compare it to a nodel with p parameters, using the formula Cp = SSRp/(theta hat)^2_max - (n-2p). If p parameters is enough, then E(Cp) = p. Can graph Cp vs number of covariates.

27
Q

Leave one out cross-validation: purpose and formula.

A

Determine if model is overfitted by using it for predictions. Sum the predicted residual sum of squares for each model, and the model with the lowest PRESS is best. May not be feasible in very large data sets.

28
Q

Types of linear transformations

A

Scaling, centering, standardization

29
Q

Disadvantages of categorization of variables

A

Not sure if the categories are “right”, eats df, can only easily compare to the single reference group, may have residual confounding

30
Q

Possible nonlinear transformations

A

log, polynomial

31
Q

Disadvantages of polynomial approach

A

May not fit well at the extremes; not sensitive to local nonlinearities; can be affected by outliers

32
Q

Idea behind penalized spline regression

A

Instead of choosing number of knots, keep in all the knots and restrict sum of squared betas for spline part be less than or equal to a constant C. This penalizes roughness (can adjust based on smoothing parameter lambda, which is = 0 using all knots).

33
Q

Goal of quadratic splines

A

First derivatives equal at knots; gives smoother fits

34
Q

Ridge and lasso regression concepts

A

Ridge: Constrains coefficient sums of squares to fixed value
Lasso: Same as ridge but can reduce some coefficients to zero

35
Q

How do you get the variance in maximum likelihood estimation

A

In large samples, use the observed information matrix (parameter estimate variance is asymptotically normal)

36
Q

Inferences in logistic regression

A

Wald-based inferences based on z statistic – the beta estimator is asymptotically normal

37
Q

How many fisher scoring iterations should you have

A

A small number

38
Q

How do you get the variance of a beta coefficient estimate in logistic regression

A

Look at the observed information matrix in the (k,k)th element

39
Q

Likelihood ratio test: goal and formula

A

G=-2log(likelihood nested model/likelihood larger model) ~ X^2 _q-p. Tests whether parameter(s) are statistically significant.

40
Q

Deviance: formula, simpler formula, and corollary in linear reg

A

-2log(likelihood fitted model/likelihood saturated model). If y values are 0/1, simplifies to -2log(likelihood fitted model). Equivalent to SSR in linear regression.

41
Q

AIC formula for logistic regression

A

AIC = -2log(likelihood fitted) + 2(p+1), or D+2(p+1) for 0/1 models

42
Q

Goal of exact logistic regression

A

Allows inferences for logistic regression in small sample sizes

43
Q

Probit regression

A

Similar to logistic regression but with a link function using the cumulative normal distribution

44
Q

Conditional logistic regression

A

Used in case-control/paired studies because number of parameters can get very large. Instead, we treat each pair as a unit of observation, knowing that each pair has a Yi=1 and a Yi=0.