WEEK 4 Flashcards

Question 1

Q

What is a response variable? Give an example

Answer

A

The subject we are interested in
Eg. In a study about plant growth and sun exposure the response is the plant growth

Question 2

Q

What is a predictor variable? Give an example

Answer

A

The factor(s) that may impact, directly or indirectly, our response variable
Eg. In a study about plant growth and sun exposure the predictor is sun exposure

Question 3

Q

What is the slope?

Answer

A

The rate in which the response variable changes with a unit change in the predictor

Question 4

Q

Define residuals and how is it calculated?

Answer

A

Its the differences between observed values of the response variable and the values predicted by the model
ei = yi - y(hat)i
where ei is the residual for the ith observation, yi is the observed response, and y(hat)i is the predicted response.

Question 5

Q

What is the coefficient of determination? and what does it do

Answer

A

The coefficient of determination is R^2 and it quantifies the proportion of the variance in the dependent variable that is predicted from the independent variables

R^2 = 1 - SSres / SS tot

Question 6

Q

How can we tell if the model is well-fitted?

Answer

A

The residuals should randomly scatter around zero with no discernible patterns when plotted against predicted values or by any of the independent variables. Which suggests that the model does not suffer from non-linearity, heteroscedasticity, or other issues that could affect the reliability of the predictions.

Question 7

Q

Define unexplained variation

Answer

A

It is the sum of squares of the residuals, known as the residual sum of squares SSres

Question 8

Q

Define the total sum of squares

Answer

A

SStot measures the total variance in the observed data. Equivalent to the sum of squares of the difference between the observed values and the mean of the observed data

Question 9

Q

Define the explained variation

Answer

A

SSexp is the part of the total variation in the response variable that is explained by the regression model. Represented by the difference between the total sum of squares and the residual sum of squares
SSexp = SStot - SSres

Question 10

Q

What values can R^2 take and what do they mean?

Answer

A

An R^2 of 0 indicates that the model does not explain any of the variability of the response data around its mean.
An R^2 of 1 indicates that the model explains all the variability of the response data around its mean.
In other words, the closer the R2 value is to 1, the better the model fits the data

Question 11

Q

What are some limitation of using R^2 to determine how good the model fits?

Answer

A

It will never decrease upon adding more predictors, which can lead to overfitting if not careful
It does not indicate whether the model is adequate or whether every predictor in the model is significant.
R^2 does not provide information on the correctness of the model structure or about the quality of the predictions for new data points.

Question 12

Q

Explain ANOVA for regression tests

Answer

A

It assesses whether any of the predictors in a multiple regression model contribute to explaining the variability in the response variable. It compares a model with all predictors included against a reduced model with only the intercept (response mean).
ANOVA for regression uses the F-statistic, which is the ratio of the mean square regression (MSR) to the mean square error (MSE).

Question 13

Q

Explain t-test for regression coefficients

Answer

A

Each regression coefficient can be tested individually using a t-test to determine if it is significantly different from zero. The null and alternative hypotheses for each predictor are:
H0: βj=0 (The predictor xj has no effect on the response variable.)
HA: βj≠0 (The predictor xj does have an effect on the response variable.)
CALCULATED AS
t=βj−0/ SE(βj)
where βj is the estimated coefficient and SE(βj) is the standard error of the coefficient. This t-statistic follows a t-distribution with n−p−1 degrees of freedom, where n is the sample size and p is the number of predictors.

Question 14

Q

What are the equations for an F-statistic test

Answer

A

F = MSR/MSE
Where:
MSR = SSreg/DFreg
MSE = SSres/DFres
F-statistic follows an F-distribution with DFreg and DFres

Question 15

Q

What are the 3 tests we must do for model validation?

Answer

A

Linearity: The relationship between predictors and the response should be linear.
Normality: The residuals should be normally distributed.
Homoscedasticity: The variance of residuals should be constant across predicted values.

Question 16

Q

What plots can we use to visually check the model?

Answer

Study These Flashcards

A

Residuals vs Fitted: This plot checks the assumption of linearity and homoscedasticity. Ideally, the residuals should be randomly dispersed around the horizontal line (red line), indicating a linear relationship and constant variance (homoscedasticity).
Q-Q Plot (Quantile-Quantile): The Q-Q plot checks whether the residuals are normally distributed. Points following the dashed line indicate normality.
Scale-Location (or Spread-Location): This plot also checks homoscedasticity by showing if residuals are spread equally across all levels of fitted values. The red line should be flat and horizontal.
Residuals vs Leverage: This helps identify influential observations that have a disproportional impact on the model. Points outside the dashed Cook’s distance lines might be influential.

Question 17

Q

Summarize OLS Regression

Answer

Study These Flashcards

A

It is a statistical technique used to estimate the relationships between a dependent variable and one or more independent variables. The OLS method minimizes the sum of the squared differences between the observed and predicted values.

Question 18

Q

Summarize Model interpretation

Answer

Study These Flashcards

A

We examined the coefficients, which quantify the effect size of each predictor. A coefficient indicates the expected change in the response variable for a one-unit change in the predictor, assuming other variables are held constant.

Question 19

Q

Summarize the goodness of fit

Answer

Study These Flashcards

A

The R^2 value was discussed as a measure of how well our model explains the variance in the response variable. A high 2R2 value suggests that the model captures a significant portion of the variance.

WEEK 4 Flashcards

Post week learning reflection (19 cards)