MLR Flashcards by Maggie Chen

How is the best regression model fit found?

The best fit model is the one that minimises the total square differences between the X data points and the model.

How well did you know this?

Not at all

Perfectly

What is simple linear regression?

A statistical technique that develops an equation that relates a dependent variable to one independent variable.

How well did you know this?

Not at all

Perfectly

What does SLR do?

1) Enables prediction of dependent variable
2) Estimates the line of best fit
3) Evaluates a linear relationship between the explanatory variable and the response variable.

How well did you know this?

Not at all

Perfectly

An important tip for interpreting SLR model?

The coefficient of X shows an increase/ decrease ON AVERAGE not certainly. The Y-intercept shows the AVERAGE is not accurate.

How well did you know this?

Not at all

Perfectly

What is the lower and upper 95%?

This is the 95% confidence interval around the estimated values. Shows the amount of variability in the models.

How well did you know this?

Not at all

Perfectly

How to find the T-value in Excel?

T.INV.2T(CI/2, number of values - number of variables -1)

How well did you know this?

Not at all

Perfectly

What does the estimated slope do?

Provide information about the relationship between the response variable and the explanatory variable. It also shows the estimated average increase in Y for a one-unit increase in X.

How well did you know this?

Not at all

Perfectly

What does the intercept show?

The value of the predicted response when the explanatory variable value is zero.

How well did you know this?

Not at all

Perfectly

What does the t-test in the SLR model show?

These show whether the coefficient values are significantly different from 0. E.g. if floor space significantly explains the variability in house prices.

How well did you know this?

Not at all

Perfectly

How to carry out a t-test?

Step 1: H0: beta=0 H1: beta =/ 0
Step 2: Assume H0 is true.
Step 3: Calculate the test statistic
t= (coefficient estimate-0)/ SE(coefficient estimate)
Step 4: Interpret results.

How well did you know this?

Not at all

Perfectly

What does the p-value show?

The p-value shows the sum of the two cut-off areas of the distribution in a two-tailed test. It shows the probability of getting the t-stat or more extreme given the null hypothesis is true.

How well did you know this?

Not at all

Perfectly

What is the connection between the confidence interval and 2 tailed t-tests?

If the t-test shows that the coefficient is significantly different from 0 p<aplha then the confidence interval doesn’t contain 0. If the opposite occurs the CI does contain 0/

How well did you know this?

Not at all

Perfectly

What does the t-test test for?

Whether the coefficients are significantly different from 0.

How well did you know this?

Not at all

Perfectly

What does the SS of regression show?

This shows the sum of the squares of the differences between the predicted values and the mean of the data points. Shows the total variation of the response variable.

How well did you know this?

Not at all

Perfectly

How to find SS Total?

SS regression + SS residual

How well did you know this?

Not at all

Perfectly

How to find the R-squared value?

SS regression/ SS total.

How well did you know this?

Not at all

Perfectly

What does R squared show?

Study These Flashcards

It shows how much of the variation is explained by the model. It evaluates the dispersion of the data points around the fitted regression line. Also called the coefficient of determination.

Why is the R-squared value not always good?

Study These Flashcards

1) Doesn’t show outliers.
2) Doesn’t show increasing variance.
3) Doesn’t show model is curved.

What is heteroscedasticity?

Study These Flashcards

The variance differs over the data set.

When can you carry out a SLR model?

Study These Flashcards

1) When the residuals have constant variance.
2) The model must be linear.
3) Residuals must be uncorrelated.
4) The errors are normally distributed with a mean of 0.

What does the standard error show?

Study These Flashcards

This is the average distance that the observed data points fall from the regression line. I.e. it tells you how wrong the model is on average using the units of the response variable.

How to calculate the standard error?

Study These Flashcards

Mean squared residual value square rooted.

How to calculate multiple R?

Study These Flashcards

The square root of the R squared value.

What is the multiple R?

Study These Flashcards

This is the correlation coefficient. Shows how strong the linear relationship is between the response y data and explanatory x data.

What does F show?

1) It tests whether the model ‘taken as a whole’ is explaining a significant portion of the variability in the response variable 2) It tests the null hypothesis that a model with no explanatory variables fits the data as well as your fitted regression model.

What is step 1 of the f-test?

H0: a model with no explanatory variables fits the data as well as this model. H1: your model fits the data better than the intercept-only model.

How to find the f-test statistic value?

MS regression/ MS residual

How to calculate the residuals?

Observed value for response - predicted value from the model.

How to check that the regression assumptions hold?

Using the residuals

What is a MLR model?

It includes more than one explanatory variable with the same assumptions as the SLR model.

How do we account for models with more than two categories in the explanatory variable?

By using dummy variables (binary variables). Always use one less than the number of categories.

What is the reference category?

All the dummy variables are equal to 0. When interpreting the fitted model, comparisons are made with respect to the reference category.

Interpreting dummy variables

If significant variability in the response variable is explained by at least one of the dummy variables we must keep the rest even if they don't.

What is a binary variable?

A variable that can only take two values.

How would you interpret D1 from the following equation? Sale Price = 17640.7 + 107.5 Floor Space - 46898.1 D1 -58001.8 D2

The sale price decreases on average by $46898 if it's a two-family conversion compared with the reference category assuming floor space stays set.

What is multicollinearity?

Where one explanatory variable in an MLR model can be predicted from one or more other explanatory variables in the model. E.g. aX1+bX2+cX3= X4

Which of the following is a possible effect of having multicollinearity in your model?

Impacts calculations regarding individual explanatory variables. This in turn means cannot rely on the interpretation of the coefficient values as an explanation for relationships between Y and X. It can also affect the validity of p-values.

When does multicollinearity occur?

Occurs when two or more explanatory variables in a regression model are moderately or highly correlated with each other.

Ways to spot multicollinearity

1) Think about definitions of the variables and whether they overlap. 2) Draw scatter plots and calculate correlations between each pair of explanatory variables. 3) Check whether the fitted coefficient values make sense. 4) Check for a contradiction between the overall ANOVA f-test and individual coefficient t-tests. 5) Play around with the model and see if the coefficients change drastically. 6) Removal of a variable from the model leads to an increased R-squared.

How to fix multicollinearity?

1)Try not to include highly correlated explanatory variables. 2) Create a new variable which is a combination of the highly correlated variables.

MLR Flashcards

(41 cards)