LEC 11b Multiple Linear Regression Flashcards

1
Q

Assumptions of multiple linear regression (5)

A
  1. The observations are independent of one another
  2. For any specified values of x, the distribution of the y values is normal
  3. For any set of values of x, the variance is equal
  4. There is little or no multicollinearity among the independent variables
    eg weight and BMI are highly correlated
  5. The relationship among the variables is represented by the equation y = alpha + beta(i)(xi) …
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

alpha

A
  • y intercept

- mean value of y when all independent variables = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

beta

A
  • slope
  • mean value of y that corresponds to a one-unit change in x(i)
  • after controlling for all other independent variables (keeping the values constant)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Multiple linear regression model dimension

A
  • multidimensional (no longer straight line)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to find the best-fitting model?

A

Method of least squares

- the model with the smallest residual sum of squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can nominal variables be incorporated into regression model?

A

Yes, using dummy variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Dummy variables

A
  • categories of the nominal variables are identified using numbers
  • numerical values that do not have any quantitative meaning
  • coded as 0 or 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For nominal variable, if there are k categories, what is the number of dummy variables needed?

A

k-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How to evaluate goodness-of-fit of regression model

A
  • coefficient of determination (R^2)

- use adjusted R^2 if model contain different numbers of independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Coefficient of determination (R^2) (3)

A
  • can be interpreted as the proportion of variability among the observed values of y that is explained by the linear regression model containing the set of independent variables
  • range from 0 to 1
  • always increase with inclusion of more independent variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Adjusted R^2 (3)

A
  • increase when the inclusion of an independent variable improves the ability to predict y
  • decrease when the inclusion of an independent variable does not improve the ability to predict y
  • cannot be directly interpreted as the proportion of variability among the observed values of y that is explained by the linear regression model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Multiple linear regression

A
  • describes the linear relationship between the dependent variable (Y) and more than 1 independent variable (continuous, ordinal or nominal)

Y = alpha + beta1(x1) + … + betak(xk)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Assumptions of Simple linear regression vs Multiple linear regression

A

Simple linear regression
1. There is linear relationship between the variables
Y = alpha + beta(x)
2. Each observations are independent of one another
3. For any specified values of X, the distribution of the Y values is normal
4. For any set of values of X, the variance is constant (equal variance)

Multiple linear regression
1. The relationship among the variables is represented by the equation
Y = alpha + beta1(x1) + … + betak(xk)
2. The observations are independent of one another
3. For any specified values of x, the distribution of the y values is normal
4. For any set of values of x, the variance is equal
5. There is little or no multicollinearity among the independent variables (not highly correlated)
eg weight and BMI are highly correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why use adjusted R^2 > normal R^2 when assessing for best fit linear regression model for multiple variables (2)

A
  • accounts for the added complexity of a model

- additional independent variable will always increase R^2, hence it is more meaningful to look at adjusted R^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Model selection types (2)

A
  1. Forward selection
    - independent variables are added one at a time with the predictor with the highest correlation with the dependent variable
  2. Backward selection
    - all independent variables are added all at once into the equation first and each independent variable is deleted one at a time
    - often preferred
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Independent variables included in Multiple linear regression

A
  • each independent variable is independently associated with the dependent variable