Correlation and Linear Regression Flashcards

1
Q

What type of graph would you use to visualise the relationship between two continuous variable?

A

Scatterplot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two main uses of scatterplots?

A

▪️Investigate empirical relationship between X (independent) and Y (dependent)
▪️Attempt to predict Y from X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What correlation?

A

How close two variables are to having a linear relationship

‘R’ is used to quantify direction and magnitude

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two types of correlation coefficient?

A

▪️Pearson’s
▪️Spearman’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the posh was of saying there is a correlation?

A

There is a linear association

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What can you determine from the correlation coefficient?

A

▪️The direction of the effect
▪️The magnitude of the effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When do you use Pearson’s correlation coefficient ‘r’?

A

To check the magnitude and direction of a linear relationship between two variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What assumptions are needed for Pearson’s correlation coefficient?

A

▪️Variables are approx. normally distributed
▪️Variables are continuous
▪️Each observation should have a pair of values
▪️No significant outliers
▪️A straight line relationship should be formed (linearity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When should be use Spearman’s Correlation coefficient ‘rs’/’ρ’ ?

A

When one or both of the variable are NOT normally distributed

Or if the data is ordinal

(less sensitive to extreme influential points)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does Spearman’s Correlation coefficient measure?

A

▪️Strength and direction of MONOTONIC relationship between two ranked variables
▪️Decrease or increase together but not necessarily at a constant rate as it would if linear

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the non-parametric version of the Pearson’s correlation coefficient?

A

Spearman’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How Spearman’s Correlation coefficient is calculated depends on whether the data…

A

▪️Does not have tied ranks
▪️Does have tied ranksn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the regression coefficients?

A

β0 (intercept) and β1 (slope)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the Y variable?

A

The dependent variable (outcome/response)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the X variable?

A

The independent variable (predictor/explanatory/covariate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the best linear regression line?

A

The line closest to all data points (residual ε is as small as possible)

17
Q

How might we estimate the linear regression line?

A

Ordinary Least Squares (OLS) - minimises the squared residuals to estimate β0 and β1

18
Q

When do we use the Simple Linear Regression Model?

A

To measure to what extent there is a linear relationship between two variables

19
Q

What is β1 in the null hypothesis?

A

0

(slope)

20
Q

What assumptions are needed for the simple linear regression model?

A

▪️There’s a linear relationship
▪️Residuals are independent of one another
▪️Residuals follow normal distribution with mean 0
▪️Homogeneity of variance - size of error doesn’t change significantly across IV

21
Q

What is R?

A

The simple correlation coefficient

22
Q

What is R squared?

A

How much the total variation of the DV can be explained by the IV

E.g. 0.270 = 27%

23
Q

How do you interpret a significant p-value in a simple linear regression ANOVA?

A

The regression model statistically significantly predicts the outcome variable (good fit)

24
Q

What do you use to predict the AVERAGE Y of a specific value of X?

A

Confidence interval of the MEAN

25
Q

What do you use to predict the specific Y of an individual with a specific value of X?

A

Confidence interval for the INDIVIDUAL

26
Q

What is the slope coefficient if X is a categorical binary variable?

A

A measure of the group difference in means

(regression line connected mean response of one group to mean response of the other)

27
Q

How do you calculate a regression model with a non binary categorical predictor?

A

First need to record it into dummy variables

28
Q

A predictor with K levels can be coded as ___ dummy variable but only _______ are necessary to fully represent the predictor.

A

▪️K
▪️K-1

29
Q

What do you call the dummy variable that is NOT included in the analysis?

A

The reference category

β1 = d1 vs d3
β1 = d2 vs d3

30
Q

How would you interpret a correlation coefficient between 0.6 and 1?

A

Strong positive

(0.8-1 = very strong!)

31
Q

How would you interpret a correlation coefficient between -0.4 and -0.59?

A

Moderate negative

32
Q

How would you interpret a correlation coefficient between 0.2 and 0.39?

A

Weak positive

33
Q

How would you interpret a correlation coefficient between 0.0 and 0.19?

A

Very weak positive/no correlation