final Flashcards by anna walther

What happens if residuals are not normally distributed?

SE’s are incorrect

How well did you know this?

Not at all

Perfectly

what happens if residuals are heteroskedastic?

Heteroskedasticity has no bearing on coefficient estimates but can produce incorrect (downwardly biased) standard errors and this lead to incorrect confidence intervals and increased Type I errors in hypothesis tests (rejecting the null when the null is true).

How well did you know this?

Not at all

Perfectly

what is a strategy to remedy heteroskedasticity?

robust standard errors

How well did you know this?

Not at all

Perfectly

what is a strategy to remedy non-normality in residuals

If the residuals are not normally distributed a transformation of the dependent variable might remedy the problem (for example, taking the natural log

How well did you know this?

Not at all

Perfectly

what happens if the distribution of a coefficient is non-linear?

Violations of linearity lead to bias in coefficient estimates. If the true relationship between x and y is curvilinear then characterizing it as linear leads to erroneous conclusions about the relationship.

How well did you know this?

Not at all

Perfectly

how do you remedy non-linearity?

To remedy nonlinearities one can respecify the model as a polynomial, spline or some other nonlinear transformation of x.

How well did you know this?

Not at all

Perfectly

What is spline?

t is a non-parametric regression technique and can be seen as an extension of linear models that automatically models nonlinearities and interactions between variables.

How well did you know this?

Not at all

Perfectly

Cook’s D is a measure of overall _____

influence

How well did you know this?

Not at all

Perfectly

what is a highly leveraged observation?

observation with x value over about 3 standard deviations from mean of x

How well did you know this?

Not at all

Perfectly

what is an outlier?

Observation with y value over about 3 standard deviations from mean of y

In regression, observation with y - yhat over about 3 standard deviations from mean of y

How well did you know this?

Not at all

Perfectly

What do hat values measure?

leverage: how much x’s residual contributes to overall TSS

xi-xbar / TSS

How well did you know this?

Not at all

Perfectly

High leverage associated with _____ error variance

small

these observations pull the regression line closer to them

How well did you know this?

Not at all

Perfectly

what happens when there is measurement error in y?

b will be the same, but standard error will grow, and r squared will shrink

standard error of the model grows

How well did you know this?

Not at all

Perfectly

when there’s measurement error in x?

r squared shrinks because there is poor model fit

SEb is smaller because the variance in x (demoniator of SE equation) is artifically large

b will attentuate (attenuation bias!)

This is only true for bivariate!!!!!!

How well did you know this?

Not at all

Perfectly

What do Cook’s D and Dfits measure?

both measure overall influence of a data point (leverage AND outlyingness)

Dfits takes E* x sqrt(h/1-h)
Cook’s D: Dfits^2/(k+1) basically standardizes dfits on the number of slopes

both measure how much a given observation affects b

How well did you know this?

Not at all

Perfectly

In layman’s terms, what is DFBETA?

difference between what we see with a given observation included, and what we would see if we exclude that observation

How well did you know this?

Not at all

Perfectly

what is the difference between Cook’s D/dfits and dfbeta?

Cook’s D/dfits measure the overall influence of an observation, while dfbeta measures the influence on an individual coefficient

How well did you know this?

Not at all

Perfectly

If residuals are NOT normally distributed, is there an effect on b?

nope!

How well did you know this?

Not at all

Perfectly

what plot do you use to diagnose linearity?

component plus residual plot

How well did you know this?

Not at all

Perfectly

what plot do you use to diagnose heteroskedacity

RBf plot

How well did you know this?

Not at all

Perfectly

what is the null hypothesis for a Breusch-Pagan test?

Study These Flashcards

that the residuals are equally distributed (homoskedastic)

What are some solutions to heteroskedasticity?

Study These Flashcards

add omitted variables, do a robust regression (increases standard errors because it folds heteroskedasticity into standard errors)

Let’s say your model violates the independence assumption: What are the consequences on your coefficient? your standard errors?

Study These Flashcards

no impact on coefficients, but you’ve underestimates your standard errors

how do you remedy a violation of the independence assumption?

Study These Flashcards

estimate clustered standard errors

true of false: in a simple random sample, all persons in a population have the same chance of selection into the sample

true! when this is true, the sample is completely unbiased

When the sample is MORE than 5 percent of a population, do you need to correct your standard errors?

yes--need to take into account that you have more certainty than you did before.

why does stratification increase efficiency?

over-samples small groups. It dramatically decreases confidence intervals around small groups. Because you don't have to take into account variance shared BETWEEN strata (strata are mutually exclusive and exhaustive), the variance is smaller.

true/false: each strata has its own variance?

true!

what is design effect in layman's terms?

loss or gain of information from not using a simple random sample. in other words, the efficiency relative to the efficiency of a SRS

Let's say you have two sets of stratified sample: one has several different means, and one has strata with means that are nearly identical. which set would you prefer?

the first one! you gain more efficiency (more variance is accounted for... I think)

If you were interested in maximizing efficiency (as you should be!) would you want the error variance to be larger or smaller?

smaller

weights are usually the ____ of the probability of selection

inverse

true or false: weights induce homoskeadsticity

false--they induce heteroskedasticity

If a data point is highly leveraged, what will the effect be on the E'i? why?

it will increase the E'i because the denominator of the equation is RMSE x sqrt(1-hi). the denominator will be small if the hat value is large (highly leveraged), meaning it will make the overall value grow

when calculating error terms, what is the difference between standardized and studentized?

studentized: SE in the denominator does NOT include i standardized: SE in denominator DOES include i

what is the equation for dfbeta?

Dij=Bj-Bj(-i) | aka b with i minus b without i

what is the equation for n efficient?

neff=n/DEFF

true or false: neff is LESS THAN n when we violate the independence assumption

true: neff will be less than n because it reflects what we really have in terms of information compared to what we would have if we did a SRS

let's talk about that weird p-looking greek letter thing... what does it mean if it's large?

there is more correlation within clusters (less independence)

if you compare cluster samples with SRS, how will the coefficients and standard errors differ?

the coefficients will be the same, but the standard errors of the cluster sample will be larger

when you do imputation, standard errors are downward biased.. what does that mean?

you're not accounting for uncertainty of making stuff up

does MCAR impact standard errors?

yes--makes them slightly bigger

How do you calculate DEFF?

var of SE(adj)/var of SE(SRS assumption)

How do you calculate the adjusted se when there is a design effect?

sqrt(DEFF) x SE = adjusted standard error

How does a suppressor impact the b?

the relationship between x1 and y is much stronger WITH x2

how does a confounder impact the b?

the relationship between x1 and y is much stronger WITHOUT x2

what does a weight do to your coefficient? to SEs?

The main reason for the use of weights is to compensate for unequal selection probabilities and nonresponse bias. Weights influence accuracy and precision of coefficient estimates but have no influence on standard errors

how does dummy variable adjustment impact your coefficients? SEs?

This generally produces biased estimates of coefficients and downward bias on standard errors Does this because it adds variance to x’s that we dont really have and makes standard errors smaller than reality Including missing dummy for missing categorical measure also induces bias Ex: People who dont respond to the category sex doesn’t mean they dont have a sex

What does imputation do to your standard errors?

Standard errors are downwardly biased Fails to account for estimation (standard error) or fundamental uncertainty Essentially saying it doesn’t account for the uncertainty inherent in making stuff up and it puts a lot of faith in a simple model

how does multiple imputation overcome downward bias in Standard Errors?

Imputes all missing values based on observed cases and doing it multiple times Overcomes downward boas by sampling from error distribution in each draw Standard errors for parameter estimates from multiply imputed data add estimation error

high leverage observation have low error variance... why?

they pull the regression line towards themselves

final Flashcards

(51 cards)