Lecture 3 - Assumptions Flashcards

1
Q

What are the key assumptions in regression-based statistical analyses? (6 total)

A

Linear relationship, normally distributed residuals, no overly influential observations, homoskedasticity, independence of observations, no multicollinearity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why are assumptions necessary in regression analyses?

A

Assumptions ensure the validity of regression-based analyses by providing certain conditions under which conclusions can be accurately drawn.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is meant by a linear relationship in regression?

A

It means the relationship between the dependent and independent variables should be linear. Non-linear relationships can be approximated by adding quadratic terms or transforming variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can you check for normally distributed residuals?

A

By using histograms, P-P plots, and QQ plots. This assumption is less critical in large samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the consequences of having overly influential observations in your data?

A

Overly influential observations can bias regression estimates. They can be detected using Mahalanobis distance and Cook’s distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is homoskedasticity and why is it important?

A

Homoskedasticity means the variance of residuals is constant across all levels of the independent variables. It is important because heteroskedasticity can bias estimates and inflate Type I error rates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you check for independence of observations in regression analyses?

A

Using the Durbin-Watson statistic. Independence of observations is crucial to avoid inflated Type I errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is multicollinearity and how can it be detected?

A

Multicollinearity is excessive correlation among predictors, making regression coefficients unstable. It can be detected using correlation tables, Tolerance, and Variance Inflation Factor (VIF).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What methods can be used to address violations of regression assumptions?

(4 Total)

A

Transformations, adding variables, robust standard errors, and being transparent in reporting assumption checks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is it important to check distributions before conducting statistical analyses?

A

Ensures the validity and robustness of analyses by confirming that variables conform to the necessary assumptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can univariate outliers be detected?

A

By converting scores to z-scores and checking for values beyond ±3.29.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are multivariate outliers and how are they detected?

A

Multivariate outliers are influential data points in regression models. They can be detected using Mahalanobis distance and Cook’s distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the Durbin-Watson statistic measure?

A

It measures the independence of residuals to ensure there is no systematic relationship between errors in regression analyses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does a VIF value indicate in regression analysis?

A

Variance Inflation Factor (VIF) indicates the extent to which the standard error for a predictor is inflated by multicollinearity. A VIF value of 5 or higher suggests multicollinearity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can homoskedasticity be assessed visually?

A

By looking at the scatterplot of standardized predicted scores by standardized residual scores. An even rectangular band of data points suggests homoskedasticity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What should you do if the assumption of normality of residuals is breached?

A

Consider applying bootstrapping, wait to check for homoskedasticity breaches, or potentially transform raw variables if necessary.

17
Q

What are the implications of non-independence of observations?

A

Non-independence overestimates the amount of independent evidence, exaggerating the precision of estimates and inflating Type I error rates.

18
Q

How do transformations help in dealing with assumption violations?
(5 Total)

A

Transformations can adjust the shape of relationships, reduce skewness, stabilize variance, and address non-linearity and heteroskedasticity.

19
Q

What is the impact of a breached homoskedasticity assumption?

A

It can bias estimates, leading to increased likelihood of spurious effects. Robust standard errors can help correct for this.

20
Q

What should you report when dealing with assumption checks?

A

Be transparent, report assumption checks and any violations, and describe how violations were addressed or why they can be ignored.