Regression diagnostics / Logistic regression Flashcards by Nicole Geist

Why do we need to consider the assumptions for a linear regression?

So we can rely on those statistics: coefficients, SE, etc.

How well did you know this?

Not at all

Perfectly

Which regression assumptions are there?

differs a bit, but roughly:

How well did you know this?

Not at all

Perfectly

What happens if you violate regression assumptions?

1) Coeffcients become unreliable –> biased
2) SE become unreliable –> any hypothesis becomes unreliable (incluing p-value/t-stat, etc.)

How well did you know this?

Not at all

Perfectly

Linearity

Assumption: the average outcome is linearly related to each term in the model when holding all others fixed –> & technically, the “linear” in “linear regression” refers to the outcome being linear in the parameters, the β’s

Problem: Biased coefficient (true form is curvilinear)

Diagnostic: Component-plus-residual plot (A significant difference between the residual line and the component line indicates that the predictor does not have a linear relationship with the dependent variable)

Solution: Polynomial, Spline, Collapse into categories

How well did you know this?

Not at all

Perfectly

Homoskedastic / Normally distributed Residuals

Assumption: constant & normal variance of residuals

Problem: Standard errors usually not correct: underestimated, also, influential observations may be present can also effect coefficients

Diagnostic: Heteroskedastic –> Plot residuals, Normallity –> HIstograms, Qnorm, Studentized residuals plot

Solution: Log-transformation / Power-transformation, Robust SE‘s, Correct coding errors

How well did you know this?

Not at all

Perfectly

No multicollinearity

Assumption: Predictors should be independent of each other, very low correlated (not present in SLR but in MLR)

Problem: “holding constant” not possible with correlated variables –> 1) interpretation becomes impossible, also model will not know which varible made difference 2) Loss of precision (inflated standard errors)

Diagnostic: Look at correlations, Variance inflation factor (assesses each variable, what’s the difference in variance if we include/exclude it –> the higher the VIF, the more information is already contained = high multi-collinearity)

Solutions: Get crafty (similar but not collinear variable), Construct an index, Get more data, Mean-center interaction variables

How well did you know this?

Not at all

Perfectly

Which diagnostic is important to consider apart from regression assumptions?

Influential observations

pull the regression fit towards themselves –> results (predictions, parameter estimates, CIs, p-values) can be quite different with and without these cases included in the analysis
do not necessarily violate any regression assumptions, they can cast doubt on the conclusions drawn from your sample. If a regression model is being used to inform real-life decisions, one would hope those decisions are not overly influenced by just one or a few observations…

How well did you know this?

Not at all

Perfectly

What is more important: Coefficient vs SE?

First, estimation then SE, correct SE no use if estamtion is biased ..

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

Which one is problematic?

B - unusual value and large residual ergo leverage which pulls regression line down, deleting it would change regression line drastically

How well did you know this?

Not at all

Perfectly

Are these two influential observations?

NO
1) large sample
2) no unusual x-value

How well did you know this?

Not at all

Perfectly

When to delete an outlier?

Cook’s D of 1 approx.

How well did you know this?

Not at all

Perfectly

not a problem of bias per se but a lack of data, very little variance + hard to separate makes it hard to be precise

How well did you know this?

Not at all

Perfectly

Study These Flashcards

a) hard decision to make

Potential problems, diagnostics, potential solutions: Influential
observations

Study These Flashcards

Which is the link function?

OLS vs Logistic Regression

Study These Flashcards

Linear = Identity link (just the mean –> model of the mean)
Logistic = Logit link (log odds of mean –> model of the log odds /also: logit)

Which distribution?

OLS vs Logistic Regression

Study These Flashcards

Linear = Gauss (continuos)
Logistic = Binomial (discrete)

What are the problems with an OLS model when it comes to binary outcomes

Study These Flashcards

1) binary outcome –> we want to model probability & linear model can give neg values but neg probabilities no meaning
2) unrealistic assumptions about constant effects
3) normal residual assumption is violated

Interpret OLS model

Study These Flashcards

1 unit increase in trust scale, decreases the mean of AFD vote by 0.05 percentage points

What does the logistic regression model?

Study These Flashcards

The logit

How does ML work?

Study These Flashcards

iterates to find best parameters

How to interpret the Log Odds?

Study These Flashcards

very hard, not intuitive ..

How to interpret the Odds Ratio?

1 --> no difference

Limitation of Odds

We can only say what the odds and how odds increase BUT NOT how likely something actuallly is (probabilites)

How to get back to probabilities?

e to the power of function Example: plug in 0 for men --> constant = 10% plug in 1 for women --> constant - coefficient = 5%

What is the average marginal effect?

Averages all the marginal effects of the dimensions of the explanatory variable-

How do the average marginal effect and linear regression relate to each other?

very similar, lin reg approx. AME

How to model this relationship with a linear model?

Polynomial of Trust in Parliament would pick up the slowing of the decline

How to make this log reg table more interpretable?

Transform to probability scale (marginy, dydx) --> Average marginal effects (makes it more interpretable but also takes out the s-shape meaning it works well in the middle but less at the edges)