final Flashcards

1
Q

What happens if residuals are not normally distributed?

A

SE’s are incorrect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what happens if residuals are heteroskedastic?

A

Heteroskedasticity has no bearing on coefficient estimates but can produce incorrect (downwardly biased) standard errors and this lead to incorrect confidence intervals and increased Type I errors in hypothesis tests (rejecting the null when the null is true).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is a strategy to remedy heteroskedasticity?

A

robust standard errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is a strategy to remedy non-normality in residuals

A

If the residuals are not normally distributed a transformation of the dependent variable might remedy the problem (for example, taking the natural log

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what happens if the distribution of a coefficient is non-linear?

A

Violations of linearity lead to bias in coefficient estimates. If the true relationship between x and y is curvilinear then characterizing it as linear leads to erroneous conclusions about the relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how do you remedy non-linearity?

A

To remedy nonlinearities one can respecify the model as a polynomial, spline or some other nonlinear transformation of x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is spline?

A

t is a non-parametric regression technique and can be seen as an extension of linear models that automatically models nonlinearities and interactions between variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cook’s D is a measure of overall _____

A

influence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is a highly leveraged observation?

A

observation with x value over about 3 standard deviations from mean of x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is an outlier?

A

Observation with y value over about 3 standard deviations from mean of y

In regression, observation with y - yhat over about 3 standard deviations from mean of y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do hat values measure?

A

leverage: how much x’s residual contributes to overall TSS

xi-xbar / TSS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

High leverage associated with _____ error variance

A

small

these observations pull the regression line closer to them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what happens when there is measurement error in y?

A

b will be the same, but standard error will grow, and r squared will shrink

standard error of the model grows

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

when there’s measurement error in x?

A

r squared shrinks because there is poor model fit

SEb is smaller because the variance in x (demoniator of SE equation) is artifically large

b will attentuate (attenuation bias!)

This is only true for bivariate!!!!!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do Cook’s D and Dfits measure?

A

both measure overall influence of a data point (leverage AND outlyingness)

Dfits takes E* x sqrt(h/1-h)
Cook’s D: Dfits^2/(k+1) basically standardizes dfits on the number of slopes

both measure how much a given observation affects b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In layman’s terms, what is DFBETA?

A

difference between what we see with a given observation included, and what we would see if we exclude that observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is the difference between Cook’s D/dfits and dfbeta?

A

Cook’s D/dfits measure the overall influence of an observation, while dfbeta measures the influence on an individual coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

If residuals are NOT normally distributed, is there an effect on b?

A

nope!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what plot do you use to diagnose linearity?

A

component plus residual plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what plot do you use to diagnose heteroskedacity

A

RBf plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is the null hypothesis for a Breusch-Pagan test?

A

that the residuals are equally distributed (homoskedastic)

22
Q

What are some solutions to heteroskedasticity?

A

add omitted variables, do a robust regression (increases standard errors because it folds heteroskedasticity into standard errors)

23
Q

Let’s say your model violates the independence assumption: What are the consequences on your coefficient? your standard errors?

A

no impact on coefficients, but you’ve underestimates your standard errors

24
Q

how do you remedy a violation of the independence assumption?

A

estimate clustered standard errors

25
Q

true of false: in a simple random sample, all persons in a population have the same chance of selection into the sample

A

true! when this is true, the sample is completely unbiased

26
Q

When the sample is MORE than 5 percent of a population, do you need to correct your standard errors?

A

yes–need to take into account that you have more certainty than you did before.

27
Q

why does stratification increase efficiency?

A

over-samples small groups. It dramatically decreases confidence intervals around small groups.

Because you don’t have to take into account variance shared BETWEEN strata (strata are mutually exclusive and exhaustive), the variance is smaller.

28
Q

true/false: each strata has its own variance?

A

true!

29
Q

what is design effect in layman’s terms?

A

loss or gain of information from not using a simple random sample. in other words, the efficiency relative to the efficiency of a SRS

30
Q

Let’s say you have two sets of stratified sample: one has several different means, and one has strata with means that are nearly identical. which set would you prefer?

A

the first one! you gain more efficiency (more variance is accounted for… I think)

31
Q

If you were interested in maximizing efficiency (as you should be!) would you want the error variance to be larger or smaller?

A

smaller

32
Q

weights are usually the ____ of the probability of selection

A

inverse

33
Q

true or false: weights induce homoskeadsticity

A

false–they induce heteroskedasticity

34
Q

If a data point is highly leveraged, what will the effect be on the E’i? why?

A

it will increase the E’i because the denominator of the equation is RMSE x sqrt(1-hi). the denominator will be small if the hat value is large (highly leveraged), meaning it will make the overall value grow

35
Q

when calculating error terms, what is the difference between standardized and studentized?

A

studentized: SE in the denominator does NOT include i
standardized: SE in denominator DOES include i

36
Q

what is the equation for dfbeta?

A

Dij=Bj-Bj(-i)

aka b with i minus b without i

37
Q

what is the equation for n efficient?

A

neff=n/DEFF

38
Q

true or false: neff is LESS THAN n when we violate the independence assumption

A

true: neff will be less than n because it reflects what we really have in terms of information compared to what we would have if we did a SRS

39
Q

let’s talk about that weird p-looking greek letter thing… what does it mean if it’s large?

A

there is more correlation within clusters (less independence)

40
Q

if you compare cluster samples with SRS, how will the coefficients and standard errors differ?

A

the coefficients will be the same, but the standard errors of the cluster sample will be larger

41
Q

when you do imputation, standard errors are downward biased.. what does that mean?

A

you’re not accounting for uncertainty of making stuff up

42
Q

does MCAR impact standard errors?

A

yes–makes them slightly bigger

43
Q

How do you calculate DEFF?

A

var of SE(adj)/var of SE(SRS assumption)

44
Q

How do you calculate the adjusted se when there is a design effect?

A

sqrt(DEFF) x SE = adjusted standard error

45
Q

How does a suppressor impact the b?

A

the relationship between x1 and y is much stronger WITH x2

46
Q

how does a confounder impact the b?

A

the relationship between x1 and y is much stronger WITHOUT x2

47
Q

what does a weight do to your coefficient? to SEs?

A

The main reason for the use of weights is to compensate for unequal selection probabilities and nonresponse bias. Weights influence accuracy and precision of coefficient estimates but have no influence on standard errors

48
Q

how does dummy variable adjustment impact your coefficients? SEs?

A

This generally produces biased estimates of coefficients and downward bias on standard errors

Does this because it adds variance to x’s that we dont really have and makes standard errors smaller than reality
Including missing dummy for missing categorical measure also induces bias
Ex: People who dont respond to the category sex doesn’t mean they dont have a sex

49
Q

What does imputation do to your standard errors?

A

Standard errors are downwardly biased
Fails to account for estimation (standard error) or fundamental uncertainty
Essentially saying it doesn’t account for the uncertainty inherent in making stuff up and it puts a lot of faith in a simple model

50
Q

how does multiple imputation overcome downward bias in Standard Errors?

A

Imputes all missing values based on observed cases and doing it multiple times
Overcomes downward boas by sampling from error distribution in each draw
Standard errors for parameter estimates from multiply imputed data add estimation error

51
Q

high leverage observation have low error variance… why?

A

they pull the regression line towards themselves