W2 Correlations and Predictions Flashcards
What does no correlation look like?

What does positive correlation look like?

What does negative correlation look like?

What does covariance tell us?
Covariance is a measure of how much two random variables vary together. The magnitude of their relationship. Their directionality ( + or -)
What is the formula for covariance?
SUM[i-n] (x[i] - μ(x))(y[i] - μ(y)) / n

What are the problems with covariance? σXY
The variables are centred, but not to scale. If Cov(X,Y) = 3.9 and Cov(Z,Q) = 5.2, we know both pairs are positively correlated, but we don’t know which one has the stronger correlation, because they could be in different scales/units.
How to scale covariance with Z scores?
z = (x – μ) / σ Divide it by the standard deviation. Standardized scores are called z-scores. SUM[i-n] ZxZy / n #for each x and y in data.
How to scale covariance from raw scores? ρ
Start with covariance. Replace (x - μ)(y - μ) with the ((x – μ) / σ) ((y – μ) / σ) Simplify and it becomes: σxy / Sqrt(σx^2 σy^2)
How do we determine if correlation means causation?
Run an experiment, explicitly manipulate independent variable, one at a time
What is the linear regression formula?
Y(hat) = b[0] + b[1]X
What’s the difference between Y and Y[hat]?
Y is the actual real life value on plotted on the graph, Y[hat] is the predicted value
What are Residuals?
Vertical deviations from a point (dot) to the line
What is the formula for SSresidual/SSerror?
SUM[i-n] (Y[i] - Y[i hat])^2
How do we calculate INTERCEPT & SLOPE from SSerror?
Start with formula, then sub in b[0] + b[1]X in place of Y[i.hat]. SUM[i-n] (Y[i] - Y[i hat])^2 SUM[i-n] (Y[i] - b[0] + b[1]Xi)^2 Then rearrange to make b1 or b0 the subject. b0 = Y[mean] - b1X[mean]
What are the key assumptions of linear regression?
Linear relationship (straight, not curve)
Homoscedasticity (not a cone, equal distrubution)
Normality of residuals (On both extremes on ends cancel out/match)
What does Heteroscedasticity look like vs Homoscedasticity?
Homo is even

What is confidence?
A shaded part of around the line. How sure you are that the values fall within the shaded part. Usually uses a confidence interval of 95%.
What is overfitting?
When the line fits the graph too strictly and bends for noise.
What is multiple regression?
When you use many X to predict one Y
What is the formula for multiple regression?
̂ Y = b0 + b1X1 + b2X2 + .. + bnXn
What is ρxy? How do you get there?
A centred scaled measure for correlation.
Start with the Covariance. For the z score of x, divide the Xi - Mx by the standard deviation.
Then simplify.

How can we calculate intercept and slope using the SSerror?

What is the formula for the SSerror? (Or SSresidual)
It’s the sum of all squared residuals
