Week 4 - Correlations and linear regression Flashcards

1
Q

If a research question has the words “relationship between X and Y” or “after controlling for Z, is there an association between X and Y?”
What kind of statistical analysis would be appropriate for these questions?

A

Correlations and regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a correlation?

A

A statistical technique for measuring the extent to which two variables are associated/ related.

Measures the pattern of responses across variables.

Assumes linear association between variables

Changes in one leads to predictable changes in another variable

Usually a bivariate association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is an association/ relationship determined?

A

When changes in one variable can show persistent and predictable changes in other variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Does correlation always mean causation?

A

No, because a third variable might be causing the observed associations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the range of values for a correlation

A

-1 (perfect negative) to +1 (perfect positive)

0 = no association therefore represents the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How s the significance of a correlation determined?

A
  • sample size (n)
  • alpha value (one (0.05) vs two-tailed (0.05)
  • –> e.g. predicting one direction (positive or negative) or two direction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which type of alpha has greater statistical power? one-tailed or two tailed?

A

A one-tailed test because it only tests in one direction (very confident hypothesis is in that direction - back up with theory)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does variance measure?

A

How much the scores deviate from the mean of the distribution (one-variable)

variance = average squared distance from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does covariance measure?

A

How much TWO variables differ from the mean

instead of sum of squares, sum of cross products are observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the problems with covariance?

How are they fixed?

A

UNIT OF MEASUREMENT - e.g. covariance of two variables might be measured in miles = 4.25 but then if converted to km the covariance is 11
–> standardise it (divide by the SD of both variables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The standardised version of a covariance is known as the ____

A

correlation coefficient

  • unaffected by units of measurement
  • makes the variances equal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

covariance = standardised/unstandardised

whereas Pearson correlation = standardised/unstandardised

A

covariance = unstandardised

Pearson correlation= standardised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does Pearson Correlation tell us?

A

Direction + strength of linear relationship between two interval/ratio variables (continuous data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What symbol denotes Pearson Correlation

A

r

r = strength and direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the size/magnitude of ‘r’ denote?

A

degree to which points fit on a straight line. Closer to 1 = more straight line indicating a linear relationship

+1 positive relationship
-1 negative relationship
0 = no relationship/ two variables are independent of one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a correlation matrix?

A

Represents each correlation between pairwise combination of variables.

Can be used for descriptive statistics/ exploratory analysis

17
Q

In a correlation matrix, each correlation is a separate test. What is the issue with multiple testing?

How can we fix this?

A

more tests (without seperate justifiable hypotheses) Increases the risk of a false positive

–> post hoc analysis = reporting associated found after data collection

18
Q

What are the assumptions of a Pearson Correlation?

A
  • Parametric test therefore is assumes variables are normally distributed
  • linear association
  • variables measured on an interval or ratio scale
19
Q

How to deal with violation to the assumption of normality in a Pearson Correlation?

A
  • if N >30, use CLT to justify preceding despite violation

- Spearman correlation

20
Q

How to deal with violation to the assumption of linearity in a Pearson Correlation?

A
  • If relationship is monotonic, use Spearman correlation

- Otherwise, transformation to achieve linearity

21
Q

What are the two situations where you can use Spearman Correlation (r s or rho)

A
  1. to find the association between two ordinal variables (X & Y consist of ranks)
  2. to measure the consistency of direction of the association between two interval/ratio variables
    - -> variables converted to ranks first before Spearman is used
  • Measures the degree of monotonic relationship between variables
22
Q

Do the Spearman Correlation Coefficient and Pearson Correlation Coefficient use the same formula?

Does this make the analysos more or less powerful?

A

yes, only the calculations are performed on ranked data instead in Spearman

Less powerful because data is lost during it’s conversion into ordinal data

23
Q

What is a monotonic relatonship?

A

Assumption that even tho the data doesn’t fit on straight line, data points are generally going in the same direction.

As Pearson correlation assumes linearity, use can use Spearman if data is non-linear but monotonic (increasing in the same direction)

24
Q

What do you use to find:

The proportion of variability in Y variable that can be attributed to variability in X

A

Coefficient of determination (r2/ r squared)

Shows how accurately one variable predicts the other

25
Q

For Spearman’s Correlation, what does the coefficient of determination tell us?

A

The proportion of variance in the RANKS that the two variables

26
Q
r = .411
r2 = 16.9

“16.9% of the variability in exam performance can be explained (overlaps) with variability in revision time.

What is the X variable/ Predictor?
Y variable/ Outcome ?
What type of variance does this represent?

A

X = revision time
Y = exam performance
Shared variance

27
Q

What is a partial correlation?

A

Measures association between two variables, controlling for the effect a third variable has on both

28
Q

What is a semi-partial/ part correlation?

A

Measures the relationship between two variables, controlling for the effect that a third variable has on ONE of the other variables

29
Q

What is a zero order correlation?

A

correlation between two variables when you do not control for anything

30
Q

What is a first order correlation?

A

partial correlation that controls for 1st variable

31
Q

What is a 2nd order correlation?

A

partial correlation that controls for two variables

32
Q

What is the directionality problem for correlations?

A

correlation =/= causation
third variable

it is not possible to determine which variable is the cause and which is the effect

33
Q

Can you compare correlation coefficients?

A

The best way is to use multiple regression to test whether the association between two variables differs by group

34
Q

Can partial correlations be performed on non-parametric data?

A

Yes, you can do Spearman partial correlations.

Spearman’s partial rank order correlation

35
Q

What is linear regression?

A

A model used to predict the value of one variable from another; describe the relationship using the equation of a straight line

36
Q

Straight line equation

A

Yi = b0 + b1X1 + Ei

Yi = the ith person’s score on the outcome variable

B0 = Y-intercept (value of Y when X=0) Point at which the regression line crosses the y-axis

B1 = regression coefficient for the predictor

  • gradient (slope) of the regression line
  • direction/ strength of relationship

Ei = the difference bewteen the actual and predicted value of Y for the ith person