W2 Correlations and Predictions Flashcards

Question 1

Q

What does no correlation look like?

Question 2

Q

What does positive correlation look like?

Question 3

Q

What does negative correlation look like?

Question 4

Q

What does covariance tell us?

Answer

A

Covariance is a measure of how much two random variables vary together. The magnitude of their relationship. Their directionality ( + or -)

Question 5

Q

What is the formula for covariance?

Answer

A

SUM[i-n] (x[i] - μ(x))(y[i] - μ(y)) / n

Question 6

Q

What are the problems with covariance? σXY

Answer

A

The variables are centred, but not to scale. If Cov(X,Y) = 3.9 and Cov(Z,Q) = 5.2, we know both pairs are positively correlated, but we don’t know which one has the stronger correlation, because they could be in different scales/units.

Question 7

Q

How to scale covariance with Z scores?

Answer

A

z = (x – μ) / σ Divide it by the standard deviation. Standardized scores are called z-scores. SUM[i-n] ZxZy / n #for each x and y in data.

Question 8

Q

How to scale covariance from raw scores? ρ

Answer

A

Start with covariance. Replace (x - μ)(y - μ) with the ((x – μ) / σ) ((y – μ) / σ) Simplify and it becomes: σxy / Sqrt(σx^2 σy^2)

Question 9

Q

How do we determine if correlation means causation?

Answer

A

Run an experiment, explicitly manipulate independent variable, one at a time

Question 10

Q

What is the linear regression formula?

Answer

A

Y(hat) = b[0] + b[1]X

Question 11

Q

What’s the difference between Y and Y[hat]?

Answer

A

Y is the actual real life value on plotted on the graph, Y[hat] is the predicted value

Question 12

Q

What are Residuals?

Answer

A

Vertical deviations from a point (dot) to the line

Question 13

Q

What is the formula for SSresidual/SSerror?

Answer

A

SUM[i-n] (Y[i] - Y[i hat])^2

Question 14

Q

How do we calculate INTERCEPT & SLOPE from SSerror?

Answer

A

Start with formula, then sub in b[0] + b[1]X in place of Y[i.hat]. SUM[i-n] (Y[i] - Y[i hat])^2 SUM[i-n] (Y[i] - b[0] + b[1]Xi)^2 Then rearrange to make b1 or b0 the subject. b0 = Y[mean] - b1X[mean]

Question 15

Q

What are the key assumptions of linear regression?

Answer

A

Linear relationship (straight, not curve)

Homoscedasticity (not a cone, equal distrubution)

Normality of residuals (On both extremes on ends cancel out/match)

Question 16

Q

What does Heteroscedasticity look like vs Homoscedasticity?

Answer

A

Homo is even

Question 17

Q

What is confidence?

Answer

A

A shaded part of around the line. How sure you are that the values fall within the shaded part. Usually uses a confidence interval of 95%.

Question 18

Q

What is overfitting?

Answer

A

When the line fits the graph too strictly and bends for noise.

Question 19

Q

What is multiple regression?

Answer

A

When you use many X to predict one Y

Question 20

Q

What is the formula for multiple regression?

Answer

A

̂ Y = b0 + b1X1 + b2X2 + .. + bnXn

Question 21

Q

What is ρxy? How do you get there?

Answer

A

A centred scaled measure for correlation.

Start with the Covariance. For the z score of x, divide the Xi - Mx by the standard deviation.

Then simplify.

Question 22

Q

How can we calculate intercept and slope using the SSerror?

Question 23

Q

What is the formula for the SSerror? (Or SSresidual)

Answer

A

It’s the sum of all squared residuals