4. Multiple Linear Regressions Flashcards
What is the formula for calculating the T-statistic in coefficient testing?
t = β / SE(β), where β is the estimated coefficient and SE(β) is the standard error of the coefficient.
How do you calculate R² in a regression model?
R² = SS_Reg / SS_Tot, where SS_Reg is the regression sum of squares and SS_Tot is the total sum of squares, representing the proportion of variance explained by the model.
What is the formula for adjusted R², and why is it used?
R²_adj = 1 - ((n - 1) / (n - p)) * (SS_Res / SS_Tot), where n is the sample size, p is the number of predictors, SS_Res is the residual sum of squares, and SS_Tot is the total sum of squares. Adjusted R² accounts for the number of predictors, providing a more accurate measure of fit.
How is the F-statistic calculated in an ANOVA table?
F = MS_Reg / MS_Res, where MS_Reg is the mean square of the regression and MS_Res is the mean square of the residuals, used to test the overall significance of the regression model.
What does homoskedasticity mean in the context of a regression model?
Homoskedasticity means that the variance of the error terms is constant across all levels of the independent variables, a key assumption in regression analysis for unbiased estimates.
How do you interpret a high R² in a multiple regression model?
A high R² indicates that a large proportion of the variance in the dependent variable is explained by the independent variables, suggesting a good fit for the model.
Why is adjusted R² preferred over R² in multiple regression?
Adjusted R² adjusts for the number of predictors, providing a more accurate measure of fit by penalizing additional variables that don’t improve the model significantly.
What does a T-statistic tell us about a regression coefficient?
The T-statistic tests whether a coefficient is statistically significantly different from zero, helping to determine if an independent variable has a meaningful impact on the dependent variable.
How do you diagnose multicollinearity in a regression model?
Multicollinearity can be diagnosed by checking high Variance Inflation Factors (VIFs) for predictors, high correlations among predictors, or instability in coefficient estimates.
How do you interpret a significant F-statistic in an ANOVA table for regression?
A significant F-statistic suggests that at least one of the independent variables significantly explains variance in the dependent variable, indicating an overall significant regression model.
What is omitted variable bias in multiple regression?
Omitted variable bias occurs when a relevant variable is left out of the model, leading to biased and inconsistent estimates of the included coefficients.
How does heteroskedasticity affect a regression model?
Heteroskedasticity, or non-constant variance of errors, can lead to inefficient estimates and unreliable standard errors, affecting the validity of hypothesis tests.
How do you calculate the F-statistic for testing overall regression significance?
F* = MS_Reg / MS_Res, where MS_Reg is the mean square for the regression and MS_Res is the mean square of residuals, used to test if the model is statistically significant.
How can you test for normally distributed errors in a regression model?
The normality of errors can be tested with visual tools like Q-Q plots or statistical tests such as the Shapiro-Wilk test, to validate regression assumptions.
Why is the correlation of error terms important in regression?
Uncorrelated error terms are required for unbiased estimates. If errors are correlated, it suggests model misspecification or omitted variables, which may bias results.
How would you interpret a T-statistic that is close to zero for a regression coefficient?
A T-statistic close to zero suggests that the coefficient is likely not statistically significant, indicating that the variable may have little or no linear impact on the dependent variable.
How can multicollinearity be addressed in a multiple regression model?
Multicollinearity can be reduced by removing highly correlated predictors, combining correlated variables, or using regularization techniques like ridge or lasso regression.