Lecture 7 Flashcards
What will we need to build regression models with multiple inputs?
linear algebra
What is r?
the correlation coefficient
What does r measure?
the strength of the linear association of two variables, x and y
What is the intuition behind r?
it measures how tightly clustered a scatter plot is around the straight line
What is the range for r?
between -1 and 1
What can we tell about a scatter plot if r is negative?
it has a negative association
What can we tell about a scatter plot if r is positive?
it has a positive association
What happens as r gets closer to ± 1?
the plot has a stronger correlation
How is the correlation coefficient, r, defined?
the average of the product of x and y, when both are in standard units
What is x_i in standard units?
x_i - xbar / σ_x
How do we express w_1* when using squared loss?
in terms of r; w_1* = (r) * (σ_y) / (σ_x)
What is the datasaurus dozen?
a collection of graphs that have the same mean, correlation, SD, but have significantly different visuals
What are the units of the slope?
units of y per units of x
What happens to our slope as the y values get more spread out?
σ_y increases and the slope gets steeper
What happens to our slope as the x values get more spread out?
σ_x increases and the slope gets shallower