Linear Regression Flashcards

1
Q

What is correlation used for?

A

measuring the strength and direction of a linear relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is regression used for?

A

describing the linear relationship with an equation for other groups or for other situations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When do you use regression?

A

when only data on the independent variable is known

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what variable is being predicted in single linear regression?

A

the dependent variable, y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what variable is used to make predictions from in single linear regression?

A

the independent variable or the predictor, X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you determine the equation of the best fitting line?

A

a technique called least squares regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are residuals?

A

the difference between the observed value of y and the predicted value of y (point on the line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what do the residuals look like when the line fits the data well?

A

they are small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what regression equation is the best?

A

the one with the smallest sum of squared residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

how do you measure the accuracy of the predictions?

A

standard error of the estimate AKA root mean squared error (RMSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the RMSE?

A

root mean squared error, the average error we make when using the regression equation to make predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the regression equation?

A

y = b0 + b1x where b0 is the y-intercept and b1 is the slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the coefficient of determination?

A

R^2 = the percentage variance or measure of how well the line represents the data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what can you do to test if the linear relationship is a significant relationship?

A

test for the slope (t-test) and test for explained variance (f-test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does the t-test test?

A

test for the slope = if the slope differs from 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the hypotheses for the t-test?

A

H0: B1 = 0
HA: B1 ≠ 0

17
Q

What does the f-test (test for explained variance) test for?

A

if the proportion of the variation that is explained by the model is significantly greater than 0

18
Q

What are the hypotheses for the f-test?

A

H0: p^2 = 0
HA: p^2 > 0

19
Q

what are the unstandardized regression coefficients?

A

b0 and b1, these are expressed in the same units as the dependent variable

20
Q

what is the standardized regression coefficient?

A

coefficient beta, it is equal to the correlation coefficient r

21
Q

What does the coefficient beta measure?

A

the change in Y in SD’s when X increases by 1 SD

22
Q

when is the coefficient beta useful?

A
  1. when the units of measurement of X and Y differ dramatically
  2. when we use more than 1 independent variable to predict Y and we wish to compare the importance of them
23
Q

What does multiple linear regression allow for?

A

adding multiple independent variables to predict the dependent variable

24
Q

what is the equation for MLR?

A

y = b0 + b1x1 + b2x2 … + bkxk

25
Q

what does adding more independent variables to a model (MLR) do?

A
  1. explain more of the variation of the dependent variable (R^2 will increase)
  2. reduce the average prediction error (the SE decreases > accuracy of prediction increases
26
Q

how can you find the significance of individual predictors?

A

using the t-test

27
Q

which predictor in MLR has the most impact on changes in the Y-variable?

A

the one with the largest standardized coefficient

28
Q

what are the assumptions for linear regression?

A
  1. normality (use histogram)
  2. homoscedasticity
  3. linearity
  4. no outliers (between -3 and 3)