Correlation, Linear Regression Flashcards

1
Q

What is the first thing you should do to look for associations between continuous variables?

A

Produce a scatter plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When do you perform a linear regression?

A

When it is clear that y may be affected by x (in no way could x be affected by y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When do you perform a correlation analysis?

A

When it is unclear which variable affects the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do both types of analysis look for?

A

Linear relationship

If relationship appears to be non-linear= Data must be transformed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the assumptions of Pearson’s Correlation Test?

A

Correlation analysis:

Parametric test= Normally distribution

Assumptions:

1) The relationship (if there is one) is linear
2) No clear causation between X and Y= Want to see if they are associated
3) The two samples come from a bivariate normal distribution
4) The data is continuous

IF X and Y are independent= No association= Correlation is 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the test statistic for Pearson’s Correlation Coefficient? What are the null and alternative hypothesis?

A

ρ

ρ is different from 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the degrees of freedom for Pearson’s Correlation Test?

A

n-2 where n is number of (x,y) pairs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the non-parametric version of the Pearson Correlation Test?

How do you carry out this test?

A

Spearman Rank Correlation Test- Uses ranks rather than the observations themselves

Null hypothesis: No association between X and Y
Test statistic: rs

n is the number of pairs and di is the difference between the ranks of the x and y for each (x,y) pair

The higher the correlation between X and Y, the closer the ranks association with the X and Y will be, which will make rs higher

The degrees of freedom= n-2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the assumptions for linear regression?

A

1) Variation in data is normally distributed about the mean
2) Variation in y is equal for all x
3) X-values are measured without error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the null and alternative hypothesis for linear regression?

A

Null= No association between X and Y

Alternative: Knowing X tells us something about Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do you carry out linear regression?

A

Sum of squares:

Find the mean of Y
Find the difference between each Y value and the mean
Square the difference to give a SStotal
Plot a straight line through the data of y= a + bx where a= y-intercept and b= slope of line
Using the line, estimate a and b and then calculate y for each x point= Estimated points
Find difference between the estimated point and the actual value and add them all together= SSresidual

SSresidual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is R square?

A

It is the coefficient of determination

Indicates how much variation (measured as SS) can be explained by assuming a linear relationship with X

SSregression= SStotal - SSresidual 
R2= SSregression/SStotal 

Larger values of R2= The line better describe the data= Linear relationship

R2= 1, implies all the data lies on a line with non-zero slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly