Chapter 8 Linear Regression Flashcards

1
Q

Linear regression

A

Statistical method for fitting a line to data where the relationship between two variables can be modeled by a straight line with some error.
Y=B0 + B1x + e
E= error & is usually dropped when writing the formula

Only used for linear trends. Nonlinear trends (bell curves) use other methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Residuals

A

Residuals are leftover variation in the data after accounting for the model fit.
Data = fit plus residual

Each observation will have a residual.

Residual for a value x in the data set = y- y^

Positive residuals means the predicted value is lower than the observed, so is under est mated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Residual plots

A

Plots of the residuals the regression line is displayed by a straight horizontal line & the residuals are plotted at their original horizontal locations, with the vertical coordinate as the residual

Residual plots identify characteristics or patterns still apparent in the date after fitting a model. Like if the fit is better at one end of the line but not the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Correlation

A

Describes the strength of a linear relationship. Always takes values between - 1& 1. Denoted as R.

R= 1/ (n-1) * mean (x- mean x/ sx) *( y- mean y/ sy)

Negative number means negative trend.
Stronger the trend, the closer to 1. If there is no apparent correlation, it will be closer to zero.

Note-sometimes nonlinear trends sometimes produce correlations that don’t reflect their strength

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Least squares regression

A

Line that minimizes the sum of the squared residuals. The conditions for a least square line generally require:
-Linearity
- Nearly normal residuals ( some outlier is far from the line)
- constant variability (variability is roughly consistent- variability of y should not be higher when X is larger)
- independent obstructions - be cautious applying regression to time series data as there may be underlying structures that should be considered

LSR =b0 + b1x
b1= Sy/Sx * R (slope)
b0 = mean y - b1
mean x (point-slope)
(sy & sx are the sample std deviations, r is correlation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

R squared

A

Correlation squared - Describing strength of fit.

Measures the fraction of the variation in y that is explained by the regression y^ = Bo + B1x. Y^ can be replaced by y

Rule of thumb is to use R2 not R to comment on the strength of an association

R2=.49 means repression explains about half of the variation in y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Leverage

A

Points that fall horizontally away from the center of the cloud tend to pull harder on the line so they the called points with high leverage
If one of these points doo appear to invoke its influence on the point, its called on influential point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What happens if we switch explanatory and response variables

A

The correlation, r, stays the same but the regression line doesn’t

If we switched them, the line would make the residuals horizontal distance the smallest, rather than the vertical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Extrapolation

A

Extrapolation means using a regression line to predict y values of x outside of the range of data. Don’t do it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In regression analysis, What 3 numbers are connected with every individual

A

X= the value of the explanatory variable
Y= the observed valve of the response variable
Y^ = the predicted value of the response variable

Y and y^ are almost never equal. The difference y- y^ is the prediction error (residual)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly