Evaluating Regression Model Fit And Interpreting Model Results Flashcards

Question 1

Q

What is the objective of multiple linear regression?

Answer

A

capture the dependent variable’s relationship with the independent variables while minimizing residual errors

Question 2

Q

What is another name for R^2 and what does r-squared (R^2) tell us?

Answer

A

R-Squared: coefficient of determination

R-Squared shows how well a regression models independent variable predicts the outcome of observed data/dependent variable

A r^2 value of 1 indicates that the model perfectly explains the variation in the dependent variable, while a value of 0 means the model explains none of the variation

Question 3

Q

What is sum of square regression/explained and formula, explain it graphed.

Answer

A

sum of squares regression: explained variable or the difference/area between line of best fit (y hat) and mean of data (y bar/mean)

(Y hat - Y mean)^2

(line of best fit - line of mean) ^2

Question 4

Q

What is sum of squares residual/error and formula, explain it graphed.

Answer

A

sum of squares residual/error: unexplained variable or the difference between the observed data point and the line of best fit outside the area of the line of best fit and mean

(Yi - y hat) ^2

(Data point - line of best fit) ^2

Question 5

Q

What is sum of square total and the 2 formulas, explain it graphed.

Answer

A

Sum of square total: the difference between the observed data point and the mean line or the unexplained and the explained variable combined (aka SSR +SSE)

SST = SSR + SSE
SST = (Yi - Y mean) ^2

sum of square total = observed data point - line of mean) ^2

Question 6

Q

What is overfitting?

Answer

A

overfitting: when a model becomes unnecessarily complex due to having too many independent variables.

Question 7

Q

What is the problem with R2 and what can we use instead?

Answer

A

R2: value will always increase (or stay at the same level) when more independent variables are added to a model, we can instead use adjusted R2

Question 8

Q

What is the difference between R^2 and adjusted R^2?

Answer

A

adjusted R-squared penalizes the inclusion of unnecessary independent variables in a model, meaning it only increases when adding a new predictor actually improves the model’s explanatory power, whereas R-squared always increases when adding more predictors, even if they are not significant

adjusted R^2 can take on a negative value

Question 9

Q

What is the formula for calculating adjusted R^2?

Answer

A

adjusted R^2: 1 - [(sum of square error / (n-k-1)) / (sum of square total / (n-1))]

n = number of observations
K = number of independent variables

Question 10

Q

What is the formula that shows the relationship between r squared and adjusted r-squared?

Answer

A

adjusted r squared = 1 - (n -1 /n - k -1) * (1 - R^2)

n = number of observations
K = number of independent variables

R^2 = must take multiple R to the 2nd power

Question 11

Q

What is parisomy in multiple linear regression?

Answer

A

“parsimony”: refers to the principle of selecting a model with the fewest possible independent variables that still adequately explains the dependent variable

Question 12

Q

What are the 2 ways to measure parisomy and formula?

Answer

A

Akaike’s Information Criterion (AIC) = n * ln (SSE/n) + 2*(k + 1)

Schwarz’s Bayesian Information Criterion (SBC) = n *ln (SSE/n) + ln (n) * (k+1)

Question 13

Q

What is the difference between Akaike’s Information Criterion (AIC) and Schwarz’s Bayesian Information Criterion (SBC), and what is Akaike’s Information Criterion (AIC) best used for and what is Schwarz’s Bayesian Information Criterion (SBC) best used for?

Answer

A

Akaike’s Information Criterion (AIC): AIC tends to favor slightly more complex models with a less stringent penalty for added parameters (best for predictive purposes)

Schwarz’s Bayesian Information Criterion (SBC): imposes larger penalty for additional parameters, making it more likely to select simpler models, especially when dealing with large datasets (best for assessing models goodness of fit for descriptive purposes)

Question 14

Q

What does a low and high Akaike’s Information Criterion (AIC) and Schwarz’s Bayesian Information Criterion (SBC) mean?

Answer

A

Akaike’s Information Criterion (AIC): low AIC indicates better model fit, meaning model explains the data well while minimizing model complexity, high AIC suggests a model might be overfitting the data and has too many parameters or variables

Schwarz’s Bayesian Information Criterion (SBC): low AIC indicates better model fit, meaning model explains the data well while minimizing model complexity, high AIC suggests a model might be overfitting the data and has too many parameters or variables

Question 15

Q

What is the difference between restricted model and unrestricted model?

Answer

A

Unrestricted model: includes all independent variables in a model

Restricted model: excludes or omits a few independent variables and assumes 0 for those independent variables

Question 16

Q

What’s the difference between null hypothesis and alternative null hypothesis?

Answer

Study These Flashcards

A

null hypothesis (H0)
alternative null hypothesis (Ha) which is the opposite of the condition described by the null hypothesis

Question 17

Q

What is a slope coefficients t-test and formula, additionally, what is the null and alternative for the t-test?

Answer

Study These Flashcards

A

t = bi - Bi / SBi

bi = estimated value of slope coefficient by the regression
Bi = hypothesized value of slope coefficient
SBi = standard error of slope coefficient of the bi

t test for slope coefficient is used to know if slope is significantly different from zero, it suggests that the independent variable significantly predicts the dependent variable

H0 = B = 0 (independent variable has no effect of dependent variable)

Ha = B doesn’t equal 0 (independent variable has significant effect on dependent variable)

Question 18

Q

When do you reject the null hypothesis using a t test?

Answer

Study These Flashcards

A

reject null hypothesis if t value is greater than critical value/significance level. degree of freedom for significance level is n-k-1

n =# of observations
k = # of independent variables

we reject the null hypothesis when the data provide strong enough evidence to conclude that it is likely incorrect

Question 19

Q

What is f test formula for one tailed test and what is the null hypothesis and alternative hypothesis for f test one tailed?

Answer

Study These Flashcards

A

f test one tailed = ((SSEr - SSEu)/q) / (SSEu/(df)

SSEr = sum of squared errors for the restricted model
SSEu = sum of squared errors for the unrestricted model
q = number of restrictions (omitted variables)
n = number of observations
k = number of independent variables (in the unrestricted model)
df = n-k-1

test to see if a model with an additional independent variable (unrestricted model) is better than a simpler model that excludes variables (restricted model)

Null hypothesis (H₀): adding more variables (predictors) to the model doesn’t make it any better at explaining the data than the simpler model.
Alternative hypothesis (H₁): adding variables to the model does make it better at explaining the data than the simpler model

Question 20

Q

What’s the formula for calculating R^2?

Answer

Study These Flashcards

A

R^2 = sum of squares error/ sum of squares total

Question 21

Q

What is f test goodness of fit, and formula? Additionally, what is the f test null hypothesis vs alternative hypothesis?

Answer

Study These Flashcards

A

f test goodness of fit = mean square regression (MSR) / mean square error (MSE)

MSR = sum of squares regression / # of independent variables (k)
MSE = sum of squares residual/ degrees of freedom (n-k-1)

used to check if there’s a significant difference between the average values (means) of different groups.
Null hypothesis (H₀): All the groups have the same average value (mean). In other words, any differences between the groups are just due to random chance.
Alternative hypothesis (H₁): At least one group has a different average value (mean).

Question 22

Q

When does p value less than 0.05 mean?

Answer

Study These Flashcards

A

means the independent variables are statistically significant (reject null hypothesis in favor of alternative hypothesis)

Evaluating Regression Model Fit And Interpreting Model Results Flashcards

(22 cards)