4. Multiple Linear Regressions Flashcards

1
Q

What is the formula for calculating the T-statistic in coefficient testing?

A

t = β / SE(β), where β is the estimated coefficient and SE(β) is the standard error of the coefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you calculate R² in a regression model?

A

R² = SS_Reg / SS_Tot, where SS_Reg is the regression sum of squares and SS_Tot is the total sum of squares, representing the proportion of variance explained by the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the formula for adjusted R², and why is it used?

A

R²_adj = 1 - ((n - 1) / (n - p)) * (SS_Res / SS_Tot), where n is the sample size, p is the number of predictors, SS_Res is the residual sum of squares, and SS_Tot is the total sum of squares. Adjusted R² accounts for the number of predictors, providing a more accurate measure of fit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is the F-statistic calculated in an ANOVA table?

A

F = MS_Reg / MS_Res, where MS_Reg is the mean square of the regression and MS_Res is the mean square of the residuals, used to test the overall significance of the regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does homoskedasticity mean in the context of a regression model?

A

Homoskedasticity means that the variance of the error terms is constant across all levels of the independent variables, a key assumption in regression analysis for unbiased estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you interpret a high R² in a multiple regression model?

A

A high R² indicates that a large proportion of the variance in the dependent variable is explained by the independent variables, suggesting a good fit for the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is adjusted R² preferred over R² in multiple regression?

A

Adjusted R² adjusts for the number of predictors, providing a more accurate measure of fit by penalizing additional variables that don’t improve the model significantly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a T-statistic tell us about a regression coefficient?

A

The T-statistic tests whether a coefficient is statistically significantly different from zero, helping to determine if an independent variable has a meaningful impact on the dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you diagnose multicollinearity in a regression model?

A

Multicollinearity can be diagnosed by checking high Variance Inflation Factors (VIFs) for predictors, high correlations among predictors, or instability in coefficient estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you interpret a significant F-statistic in an ANOVA table for regression?

A

A significant F-statistic suggests that at least one of the independent variables significantly explains variance in the dependent variable, indicating an overall significant regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is omitted variable bias in multiple regression?

A

Omitted variable bias occurs when a relevant variable is left out of the model, leading to biased and inconsistent estimates of the included coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does heteroskedasticity affect a regression model?

A

Heteroskedasticity, or non-constant variance of errors, can lead to inefficient estimates and unreliable standard errors, affecting the validity of hypothesis tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you calculate the F-statistic for testing overall regression significance?

A

F* = MS_Reg / MS_Res, where MS_Reg is the mean square for the regression and MS_Res is the mean square of residuals, used to test if the model is statistically significant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can you test for normally distributed errors in a regression model?

A

The normality of errors can be tested with visual tools like Q-Q plots or statistical tests such as the Shapiro-Wilk test, to validate regression assumptions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is the correlation of error terms important in regression?

A

Uncorrelated error terms are required for unbiased estimates. If errors are correlated, it suggests model misspecification or omitted variables, which may bias results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How would you interpret a T-statistic that is close to zero for a regression coefficient?

A

A T-statistic close to zero suggests that the coefficient is likely not statistically significant, indicating that the variable may have little or no linear impact on the dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can multicollinearity be addressed in a multiple regression model?

A

Multicollinearity can be reduced by removing highly correlated predictors, combining correlated variables, or using regularization techniques like ridge or lasso regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How do you interpret the results when both R² and adjusted R² values are low?

A

Low values for R² and adjusted R² indicate that the model explains very little of the variance in the dependent variable, suggesting either poor predictors or potential issues with model specification.

19
Q

How does one test for joint significance of regression coefficients using the F-statistic?

A

Use the F-statistic to test joint significance by comparing MS_Reg and MS_Res. A high F-value indicates that the set of predictors together significantly explains variation in the dependent variable, beyond what would be expected by chance.

20
Q

How would you detect and address heteroskedasticity in regression residuals?

A

Heteroskedasticity can be detected with tests like the Breusch-Pagan test or visualized through residual plots. To address it, one can use robust standard errors, transform the dependent variable, or apply weighted least squares regression.

21
Q

How do you calculate the predicted value of the dependent variable in a multiple linear regression model?

A

Use the equation Ŷ = β₀ + β₁X₁ + β₂X₂ + … + βpXp, where Ŷ is the predicted value, β₀ is the intercept, β₁, β₂, …, βp are the coefficients, and X₁, X₂, …, Xp are the predictor variables.

22
Q

How do you calculate the sum of squared residuals (SSR) in multiple linear regression?

A

SSR = Σ(Yi - Ŷi)², where Yi is the observed value and Ŷi is the predicted value from the model, summing over all observations.

23
Q

How do you interpret the coefficient βj in a multiple linear regression model?

A

βj represents the change in the dependent variable for a one-unit increase in the predictor Xj, holding all other predictors constant.

24
Q

What is the formula for calculating the variance of the residuals in multiple regression?

A

Variance of residuals = SSR / (n - p - 1), where SSR is the sum of squared residuals, n is the number of observations, and p is the number of predictors.

25
Q

How do you calculate the Variance Inflation Factor (VIF) for a predictor Xj?

A

VIFj = 1 / (1 - R²j), where R²j is the R² value obtained from regressing Xj on all other predictors. A high VIF indicates multicollinearity.

26
Q

How do you calculate the adjusted R² for a multiple regression model with p predictors?

A

Adjusted R² = 1 - [(n - 1) / (n - p - 1)] * (1 - R²), where n is the number of observations, p is the number of predictors, and R² is the unadjusted R² value.

27
Q

How do you calculate the standard error of a regression coefficient βj in multiple regression?

A

SE(βj) = √[σ² * (1 / (n - p - 1) * Σ(Xj - X̄j)²)], where σ² is the residual variance, Xj is the value of predictor j, and X̄j is the mean of predictor j.

28
Q

How do you calculate the F-statistic to test if a subset of k coefficients is jointly equal to zero?

A

F = [(SSRr - SSRur) / k] / (SSRur / (n - p - 1)), where SSRr is the sum of squared residuals from the restricted model, SSRur is from the unrestricted model, and k is the number of restrictions.

29
Q

How do you calculate the t-statistic for testing if a regression coefficient βj is significantly different from zero?

A

t = βj / SE(βj), where βj is the estimated coefficient and SE(βj) is its standard error.

30
Q

How do you interpret the partial correlation between a predictor Xj and the dependent variable Y?

A

The partial correlation measures the relationship between Xj and Y while controlling for the other predictors, showing how much of the variance in Y can be uniquely explained by Xj.

31
Q

How do you calculate the expected change in the dependent variable given changes in multiple independent variables?

A

ΔY = β₁ΔX₁ + β₂ΔX₂ + … + βₖΔXₖ, where each β represents the coefficient of the independent variable and ΔX represents the change in that variable.

32
Q

What is the formula for the T-statistic in coefficient testing?

A

t = β̂ / SE(β̂), where β̂ is the estimated coefficient and SE(β̂) is its standard error, used to test if the coefficient is significantly different from zero.

33
Q

How do you calculate the F-statistic for joint hypothesis testing?

A

F = MS_Reg / MS_Res, where MS_Reg is the mean square regression and MS_Res is the mean square residual, used to test if multiple coefficients are jointly significant.

34
Q

How is R² calculated, and what does it represent?

A

R² = SS_Reg / SS_Tot, where SS_Reg is the regression sum of squares and SS_Tot is the total sum of squares; it measures the proportion of variance in the dependent variable explained by the model.

35
Q

How is adjusted R² calculated, and why is it important?

A

R²_adj = 1 - ((n - 1) / (n - k - 1)) * (SS_Res / SS_Tot), where n is the number of observations, k is the number of predictors. Adjusted R² accounts for model complexity, penalizing for additional predictors.

36
Q

How do you calculate Mean Square Regression (MS_Reg) in an ANOVA table?

A

MS_Reg = SS_Reg / df_Reg, where SS_Reg is the regression sum of squares, and df_Reg is the degrees of freedom for regression.

37
Q

How do you calculate Mean Square Residual (MS_Res) in an ANOVA table?

A

MS_Res = SS_Res / df_Res, where SS_Res is the residual sum of squares, and df_Res is the degrees of freedom for residuals.

38
Q

How is the total sum of squares (SS_Tot) calculated in ANOVA?

A

SS_Tot = SS_Reg + SS_Res, where SS_Reg is the regression sum of squares, and SS_Res is the residual sum of squares.

39
Q

What does it mean if the variance of the error term is not constant (heteroskedasticity)?

A

If Var(εi) ≠ σ², heteroskedasticity is present, meaning the variance of errors changes with levels of the independent variables, affecting standard errors and inference.

40
Q

How do you calculate the standard error of the estimate?

A

SE = √(SS_Res / (n - k - 1)), where SS_Res is the residual sum of squares, n is the sample size, and k is the number of predictors, measuring the typical distance between observed and predicted values.

41
Q

What are robust standard errors, and when are they used?

A

Robust standard errors, like White and Newey-West standard errors, adjust for heteroskedasticity or autocorrelation in error terms, providing more reliable inferences when standard assumptions are violated.

42
Q

What is the formula for the vector of estimated coefficients in multiple linear regression?

A

β̂ = (XᵀX)⁻¹ XᵀY, where β̂ is the vector of estimated coefficients, Xᵀ is the transpose of the matrix of predictors, (XᵀX)⁻¹ is the inverse of the product of Xᵀ and X, and Y is the vector of observed values.

43
Q

What are the dimensions of the matrices Y, X, β, and e in multiple linear regression?

A

In a regression model with n observations and p predictors:

Y is an (n x 1) vector of observed values.
X is an (n x p’) matrix of predictors (where p’ = p + 1, including the intercept).
β is a (p’ x 1) vector of coefficients.
e is an (n x 1) vector of residuals (errors).