14. Linear regression models (Mediation and moderation) Flashcards

1
Q

What does multiple regression do?

A

Tells us how does the mean of DV change as a function of IVs- can partial the effect of each IV
* Ceteris paribus = other things equal
* What is the effect of education on salary, keeping gender, region, industry…. equal?
* Whatever is part of the regression is controlled for and held constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name the 7 assumptions for multiple linear regression

A

1- linearity
2- random sampling
3- no perfect collinearity
4- zero conditional mean of error
5- homoscedasticity
6 (Normality)
7. Multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is linearity?

A
  • States that the population model is linear in the parameters β0, β1,β2…βk
  • This doesn’t mean that the variable itself cannot be recorded in logarithms and squares and other
    functions
  • For example- log(salary), age and age2 -these change the interpretation of the coefficient!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is random sampling?

A
  • We minimize residuals in order to estimate betas for a specific sample, but this sample should
    reflect the population distribution (and it will, if it was randomly selected from the population)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is “no perfect collinearity”?

A
  • None of the IVs is constant and exact linear combination of each other
  • This still allows for including age and age2 but would not allow to include exact linear function of
    age (for example age in different units- decades)
  • Also fails if n<k+1, we need at least as many observations as many parameters we are trying to
    estimate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is zero conditional mean of error?

A
  • Conditional on any values of the IVs the error term is expected to be 0
  • Misspecifing functional relationship or omitting important variable that is correlated with IVs
    violates this assumption
  • Minimization of residuals- calculation of betas- depends on this assumption
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is homoscedasticity?

A
  • Conditional on any values of the IVs the error term has the same variance= σ2
  • So meanwhile values of DV are linear combination of IVs,
    the variance of u should not depend on values of IVs
    (f.e. if errors get bigger at higher level of X- we are more imprecise, with higher X)
  • Violated if errors autocorrelated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is normality?

A

Unobserved error is normally distributed in the population: u ~ Normal (0,σ2)
* We don’t know much about the sampling distribution of our OLS estimator (β’), in order to do statistical
inference we need to have distribution of betas (calculate standard errors of betas)
* Distribution of estimated betas depend on distribution of errors- so we assume distribution of errors is normal
* Assumption 6 is stronger/necessarily includes assumption 4 and 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is multicollinearity?

A

Variance of beta (coefficient) should be as low as possible- that means our coefficient is precise
* Variance of beta (coefficient)= variance of error / [total variance in x (1- correlation of x with other IVs in regression)]
Elements influencing variance of beta:
* Variance of error- the higher is σ2 the higher is variance of beta
* Variance of beta is lower the higher the variance of IV in the data
* Variance of beta is higher the higher the correlation of the IV with other IVs (R2j)- MULTICOLLINEARITY
* Multicollinearity does not invalidate assumptions but it creates problem- increases variance of beta to possibly such and
extent that it is impossible to distinguish its separate effect (becomes insignificant)
* Multicollinearity can be detected in simple correlation table but also through variance inflation factor
* VIF= 1/(1-R2j)- reflects how much variance in one IV can be explained by the other IVs
* Rule of thumb usually above 4, above 10 is serious multicollinearity (means R2j is above 0.9)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is VIF?

A

VIF is actually in the main table already
* VIF= 1/(1-R2j)- reflects how much variance in one IV can be explained by the other IVs
* Rule of thumb usually above 4, above 10 is serious multicollinearity (means R2j is above 0.9)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is outliers?

A

Pulling the regression line in the wrong direction, creating large residuals
One extreme value (could be typo, mistake in the data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly