Correlation and Linear Regression Flashcards
What type of graph would you use to visualise the relationship between two continuous variable?
Scatterplot
What are the two main uses of scatterplots?
▪️Investigate empirical relationship between X (independent) and Y (dependent)
▪️Attempt to predict Y from X
What correlation?
How close two variables are to having a linear relationship
‘R’ is used to quantify direction and magnitude
What are the two types of correlation coefficient?
▪️Pearson’s
▪️Spearman’s
What is the posh was of saying there is a correlation?
There is a linear association
What can you determine from the correlation coefficient?
▪️The direction of the effect
▪️The magnitude of the effect
When do you use Pearson’s correlation coefficient ‘r’?
To check the magnitude and direction of a linear relationship between two variable
What assumptions are needed for Pearson’s correlation coefficient?
▪️Variables are approx. normally distributed
▪️Variables are continuous
▪️Each observation should have a pair of values
▪️No significant outliers
▪️A straight line relationship should be formed (linearity)
When should be use Spearman’s Correlation coefficient ‘rs’/’ρ’ ?
When one or both of the variable are NOT normally distributed
Or if the data is ordinal
(less sensitive to extreme influential points)
What does Spearman’s Correlation coefficient measure?
▪️Strength and direction of MONOTONIC relationship between two ranked variables
▪️Decrease or increase together but not necessarily at a constant rate as it would if linear
What is the non-parametric version of the Pearson’s correlation coefficient?
Spearman’s
How Spearman’s Correlation coefficient is calculated depends on whether the data…
▪️Does not have tied ranks
▪️Does have tied ranksn
What are the regression coefficients?
β0 (intercept) and β1 (slope)
What is the Y variable?
The dependent variable (outcome/response)
What is the X variable?
The independent variable (predictor/explanatory/covariate)
What is the best linear regression line?
The line closest to all data points (residual ε is as small as possible)
How might we estimate the linear regression line?
Ordinary Least Squares (OLS) - minimises the squared residuals to estimate β0 and β1
When do we use the Simple Linear Regression Model?
To measure to what extent there is a linear relationship between two variables
What is β1 in the null hypothesis?
0
(slope)
What assumptions are needed for the simple linear regression model?
▪️There’s a linear relationship
▪️Residuals are independent of one another
▪️Residuals follow normal distribution with mean 0
▪️Homogeneity of variance - size of error doesn’t change significantly across IV
What is R?
The simple correlation coefficient
What is R squared?
How much the total variation of the DV can be explained by the IV
E.g. 0.270 = 27%
How do you interpret a significant p-value in a simple linear regression ANOVA?
The regression model statistically significantly predicts the outcome variable (good fit)
What do you use to predict the AVERAGE Y of a specific value of X?
Confidence interval of the MEAN
What do you use to predict the specific Y of an individual with a specific value of X?
Confidence interval for the INDIVIDUAL
What is the slope coefficient if X is a categorical binary variable?
A measure of the group difference in means
(regression line connected mean response of one group to mean response of the other)
How do you calculate a regression model with a non binary categorical predictor?
First need to record it into dummy variables
A predictor with K levels can be coded as ___ dummy variable but only _______ are necessary to fully represent the predictor.
▪️K
▪️K-1
What do you call the dummy variable that is NOT included in the analysis?
The reference category
β1 = d1 vs d3
β1 = d2 vs d3
How would you interpret a correlation coefficient between 0.6 and 1?
Strong positive
(0.8-1 = very strong!)
How would you interpret a correlation coefficient between -0.4 and -0.59?
Moderate negative
How would you interpret a correlation coefficient between 0.2 and 0.39?
Weak positive
How would you interpret a correlation coefficient between 0.0 and 0.19?
Very weak positive/no correlation