19 - Fitting Lines to Data Flashcards

Question 1

Q

Least Squares Line

Answer

A

the “best line” minimizes the sum of the squares of the vertical distances from the points to the line

Question 2

Q

Least Squares estimates of the slope and intercept

Answer

A

fitted values written as y^, using the line y^ = b₀ + b₁x

residual (e) = the difference, y - y^

least squares estimates:

b₁ = r * (s_y)/(s_x)

b₀ = ybar - b₁*xbar

(r = correlation between y & x)

Question 3

Q

fitted model

slope

intercept

Answer

A

fitted model → y^ = b₀ + b₁*x

slope → b₁

*understand the units on b₁. They are the units of y over the units of x*

intercept → b₀

*has units of y*

Question 4

Q

Residual

Answer

A

e; the vertical distance from the point to the least squares line

always look at a plot of e against x → the residual plot

*the residuals should have no structure at all, should look like a random swarm of points*

Question 5

Q

numerical summaries of the residuals

Answer

A

sample mean of the residuals always = 0
sample standard deviation of the residuals:

s_e = sqr [(e²₁ +…+ e²_n)/(n-2)]

- (n-2) → bc we have estimated both slope and intercept in the regression
  - s_e → measures unexplained variation in y
    - low values of s_e are good

Question 6

Q

Root Mean Squared Error (RMSE)

Question 7

Q

Data = Signam + Noise paradigm

Answer

A

y = y^ + e

the model splits the observed data, y, into 2 parts: a systematic part, y^, and a random component, e

Question 8

Q

R²

Answer

A

(r)² = sample correlation squared

the proportion of variability in y explained by the regression model

0 <= R² <= 1
R² = 1 → perfect linear association
R² = 0 → no linear association
R² has no measurement units
we prefer models with a higher R²

Question 9

Q

R² ≈ ____

Answer

A

R² ≈ 1 - (s_e²/s_y²)

if the variance of the residuals is small compared to the variance of the raw data y, then that is good, we have explained a lot of variation in y by using the model

Question 10

Q

Spurious association

Answer

A

driven by an omitted variables

Question 11

Q

Regression only identifies _____ and not _______

Answer

A

association, not causation

19 - Fitting Lines to Data Flashcards

(11 cards)