L12: Linear regression Flashcards

1
Q

How do we estimate the effect of independent variable on dependent variable?

A

Using regression analysis.

If dependent variable (y) is a continuous variable –> linear regression
y is ordinal –> ordinal regression
y is nominal (dichotomas) –> logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is simple vs multiple regression?

A

Simple –> only 1 independent x variable

Multiple/multivariable –> >1 independent x variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between correlation and simple linear regression?

A

Correlation:

  • quantifies the degree to which two variables are related, provided that the relationship is linear
  • makes no distinction between the two variables (treated symmetrically)

Simple linear regression:

  • determines the best fitting straight line to investigate the change in dependent variable y (continuous) that corresponds to a given change in independent variable x (continuous, ordinal or nominal), provided that there is significant correlation.
  • two variables are assymmetrical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Can we extrapolate beyond the observed range of values?

A

No, do not extrapolate the regression line beyond the observed range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the general equation for simple linear regression model?

A

y = alpha + beta (x)

alpha = y intercept = mean value of y when x = 0 
beta = slope = change in the MEAN value of y when there is a one-unit change in x
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the simple linear regression model use?

A

Method of least squares (smallest residual sum of squares)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we evaluate the goodness of fit of the simple linear regression model?

A
  • using the coefficient of determination (R2)
  • in simple linear regression, R2 = r2 (r is the pearson product moment correlation coefficient)
  • R2: proportion of variability among the observed values of y that is explained by the linear regression model
  • range of R2 from 0 to 1 (bc r is from -1 to 1)
  • if R2 = 1 –> all data pts lie exactly on the best fitting line
  • if R2 = 0, there is no linear relationship between x and y
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How to check for statistical significance of linear regression model?

A

Look at p-value for the beta

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is multiple linear regression?

A
  • when there is >1 independent variable –> have multiple betas
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does multiple linear regression use?

A

Method of least squares

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When are dummy variables used?

A

Used when we have nominal independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How many dummy variables do we need for a nominal variable with k categories?

A

k-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do we evaluate the goodness of fit of the multiple linear regression model?

A
  • inspect the coefficient of determination (R2)
  • R2 is the proportion of variability among the observed values of y that is explained by the linear regression model containing the set of independent variables
  • range of values is 0 to 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do we compare between models that contain different numbers of independent variables?

A

Compare the adjusted R2
(adjusted R2 increases when inclusion of independent variable improves the ability to predict y, and decreases when it does not)

-HOWEVER, adjusted R2 cannot be directly interpreted as proportion of variability among observed values of y explained by the linear regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does beta represent in a multiple linear regression model?

A

e.g.
For every 1 unit increase in x, the MEAN y will increase/decrease by ___, after controlling for other x variables (keeping them constant).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What can we conclude from the p values from multiple linear regression model?

A

If statistically significant:

x variables are independently associated with y.