19 - Fitting Lines to Data Flashcards

1
Q

Least Squares Line

A

the “best line” minimizes the sum of the squares of the vertical distances from the points to the line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Least Squares estimates of the slope and intercept

A

fitted values written as y^, using the line y^ = b0 + b1x

residual (e) = the difference, y - y^

least squares estimates:

b1 = r * (sy)/(sx)

b0 = ybar - b1*xbar

(r = correlation between y & x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

fitted model

slope

intercept

A

fitted model → y^ = b0 + b1*x

slope → b1

​*understand the units on b1. They are the units of y over the units of x*

intercept → b0

*has units of y*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Residual

A

e; the vertical distance from the point to the least squares line

always look at a plot of e against x → the residual plot

*the residuals should have no structure at all, should look like a random swarm of points*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

numerical summaries of the residuals

A
  • sample mean of the residuals always = 0
  • sample standard deviation of the residuals:

se = sqr [(e21 +…+ e2n)/(n-2)]

    • (n-2) → bc we have estimated both slope and intercept in the regression
      • se → measures unexplained variation in y
        • low values of se are good
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Root Mean Squared Error (RMSE)

A

se

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Data = Signam + Noise paradigm

A

y = y^ + e

the model splits the observed data, y, into 2 parts: a systematic part, y^, and a random component, e

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

R2

A

(r)2 = sample correlation squared

the proportion of variability in y explained by the regression model

  • 0 <= R2 <= 1
  • R2 = 1 → perfect linear association
  • R2 = 0 → no linear association
  • R2 has no measurement units
  • we prefer models with a higher R2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

R2 ≈ ____

A

R2 ≈ 1 - (se2/sy2​)

  • if the variance of the residuals is small compared to the variance of the raw data y, then that is good, we have explained a lot of variation in y by using the model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Spurious association

A

driven by an omitted variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Regression only identifies _____ and not _______

A

association, not causation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly