Lecture 3 - Assumptions Flashcards
What are the key assumptions in regression-based statistical analyses? (6 total)
Linear relationship, normally distributed residuals, no overly influential observations, homoskedasticity, independence of observations, no multicollinearity.
Why are assumptions necessary in regression analyses?
Assumptions ensure the validity of regression-based analyses by providing certain conditions under which conclusions can be accurately drawn.
What is meant by a linear relationship in regression?
It means the relationship between the dependent and independent variables should be linear. Non-linear relationships can be approximated by adding quadratic terms or transforming variables.
How can you check for normally distributed residuals?
By using histograms, P-P plots, and QQ plots. This assumption is less critical in large samples.
What are the consequences of having overly influential observations in your data?
Overly influential observations can bias regression estimates. They can be detected using Mahalanobis distance and Cook’s distance.
What is homoskedasticity and why is it important?
Homoskedasticity means the variance of residuals is constant across all levels of the independent variables. It is important because heteroskedasticity can bias estimates and inflate Type I error rates.
How do you check for independence of observations in regression analyses?
Using the Durbin-Watson statistic. Independence of observations is crucial to avoid inflated Type I errors.
What is multicollinearity and how can it be detected?
Multicollinearity is excessive correlation among predictors, making regression coefficients unstable. It can be detected using correlation tables, Tolerance, and Variance Inflation Factor (VIF).
What methods can be used to address violations of regression assumptions?
(4 Total)
Transformations, adding variables, robust standard errors, and being transparent in reporting assumption checks.
Why is it important to check distributions before conducting statistical analyses?
Ensures the validity and robustness of analyses by confirming that variables conform to the necessary assumptions.
How can univariate outliers be detected?
By converting scores to z-scores and checking for values beyond ±3.29.
What are multivariate outliers and how are they detected?
Multivariate outliers are influential data points in regression models. They can be detected using Mahalanobis distance and Cook’s distance.
What does the Durbin-Watson statistic measure?
It measures the independence of residuals to ensure there is no systematic relationship between errors in regression analyses.
What does a VIF value indicate in regression analysis?
Variance Inflation Factor (VIF) indicates the extent to which the standard error for a predictor is inflated by multicollinearity. A VIF value of 5 or higher suggests multicollinearity.
How can homoskedasticity be assessed visually?
By looking at the scatterplot of standardized predicted scores by standardized residual scores. An even rectangular band of data points suggests homoskedasticity.