Multiple Linear Regression Flashcards

1
Q

Write the regression equation.

A

y = b0 + b1 x + e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why will the estimate value of y not be perfect?

A

Because there are residuals or errors, which is what ‘e’ stands for.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a regression line used for?

A

To make predictions about the value of the dependent variable based on a value of the predictor.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What happens to R^2 when more variables are added?

A

R^2 always increases even when the new variables have no predictive power.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we know if two models are nested?

A

If one model contains all the terms of the other, and at least one additional term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is b(0)?

A

The value of y variable when x variable=0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is b(1)?

A

This is the amount of change in variable y for one unit change of variable x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Write the multiple regression equation.

A

y = b(0) + b(1)x(1) + b(2)x(2)… + b(n)x(n) + e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a fully specified model?

A

A model in which we have accounted for all factors that determine variation in the dependent variable (y).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why can’t we usually have a fully specified model?

A

We cannot measure all the factors that affect y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the relationship between the t and p values?

A

As the t value increases, the p value increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the t value for significance at 0.05 level of confidence?

A

+/- 1.96

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does rejecting the null hypothesis mean?

A
  • Our relationship is not likely to have occurred by chance
  • Our relationship is likely to be reflected in the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What do we do when we want to add a categorical variable such as sex to the model?

A

We create a variable that takes the values “0” and “1” for men and women, respectively.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What do we do if there are multiple categories?

A

We create multiple dummy variables where one category is a reference and left out of the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does R^2 mean?

A

How much variability in the dependent variable is explained for, e.g., an R-squared value of 0.66 means that 66% of the variance in the y variable is
explained by the x variables.

17
Q

How do we interpret the p value?

A

If the p value is less than the alpha value of 0.05 then we know that our model has at least one significant independent variable.

18
Q

How does the forward stepwise selection method work?

A
  • Begins with no variables and introduces variables one by one
  • Add variables that increase R2 the most
  • Continue this procedure until none of the remaining variables explain a significant amount of the additional variability in y
19
Q

How does the backwards stepwise selection method work?

A
  • Starts will all variables in the model
  • Drops variables that contribute least to R2
  • Process continues until remaining variables explain a significant proportion of variability in y
20
Q

How do we compare coefficients that are measured in different units?

A

We standardise the coefficients into beta coefficients.

21
Q

How do we interpret beta coefficients?

A

As “a one standard deviation unit increase in x leads to a ___standard deviation unit increase/decrease in y.”
This way, we can compare continuous IVs to see which has the largest association.

22
Q

Why should we not standardise categorical variables?

A

Because a 0/1 dummy variable cannot be increased by one standard deviation

23
Q

What does the relaimpo package do?

A

Provides measures of relative importance for each of the predictors in the model by entering regression variables in all possible orders, and then averaging the changes in the R2.

24
Q

What does adjusted R^2 do?

A

The adjusted R2 controls for the number of variables we have included in our model, so it avoids the problem of R^2 increasing when more variables are added.

25
Q

How do we control for multicollinearity through plots?

A
  • Observe the correlation between the predictors in the model by plotting them against each other.
  • If correlation between any two predictors is strong, then one of them needs to be removed from the model
26
Q

What is Variance Inflation Factor (VIF)?

A

VIF larger than 5 or 10 indicates serious problems with collinearity.

27
Q

Why is multicollinearity a problem?

A
  • If a predictor is strongly related to some other input, then we are simply adding redundant information to the model
  • It can be difficult to separate the effects of the multicollinear predictors