Lecture 36- Correlation Flashcards

Question 1

Q

What does the correlation coefficient (r) summarize?

Answer

A

The strength of a linear relationship between variables as well as the direction of this relationship

Question 2

Q

How do you interpret r i.e. what values mean what?

Answer

A

r is always between -1 to +1
A positive r value means Y and X increase together
A negative r value means as Y increases X decreases (and vice versa: basically what ever one variable does the other variable is doing the opposite thing).

Question 3

Q

What does r=0 mean?

Answer

A

There is no linear relationship between the variables

Question 4

Q

What does a strong/ weak relationship look visually?

Answer

A

Weak= more scatter

- Strong= Points clustered heavily around the line of best fit

Question 5

Q

Calculate the correlation coefficient using the data on slide 694 and the equation found here (don’t need to memorize)?

Answer

A

Answers on slide

Question 6

Q

What function in r calculates the correlation coefficient?

Question 7

Q

How do you set up data in r?

Answer

A

x=c(data)
y=c(data)

Note: can use = or a backwards arrow

Question 8

Q

What is S subscript xy?

Answer

A

The sample covariance between x and y

Question 9

Q

What can the correlation coefficient ‘r’ be rewritten as?

Answer

A

S(subscript xy)/ Sx times Sy

Note: Sx and Sy are the sample standard deviations for the x and y variables

Question 10

Q

Can a correlation coefficient be used for prediction? Why or why not?

Answer

A

No, because its not a model

Question 11

Q

What is meant by the statement that the correlation coefficient is symmetric in variables?

Answer

A

Correlation between x and y is the same as correlation between y and x

Question 12

Q

What is R^2?

Answer

A

The coefficient of determination: how well does our regression model describe the data
Is the squared correlation between the observed and predicted responses

Question 13

Q

How do you interpret R^2 i.e. what does the numbers mean?

Answer

A

Close to 1= regression model describes the data well
Low value (close to 0) indicates a regression that describes the data poorly

(can only be between 0 and 1, not such thing as a negative R squared value because squaring by nature removes negative signs)

Question 14

Q

What does the total sum of squares describe in contrast to R^2?

Answer

A

Total sum of squares (TSS)= overall variation in the response variable
R^2 is instead the proportion of variation in the response that is explained by the predictor variable i.e. how good our model is

Question 15

Q

What is the residual sum of squares (RSS)?

Answer

A

The total variation of the data points about the regression line i.e how far are our measured y values from the prediction (according to our fitted model)
In other words RSS is the variation not explained by the regression model