Week 10 - Correlation and Linear Regression Flashcards
What is the point of a correlation test?
Allows you to examine an association between two scale variables
What are the different symbols for the Pearson correlation coefficient and when are they used?
r - when measured from a study sample
rho (𝜌) - when discussing the statistical population
What do the values of the Pearson correlation coefficient (r) represent?
Values close to zero means there is no linear association
Values near 1 (positive slope, direct) means x and y increase together and are very close to a straight line (as x increases by one, so does y)
Values between -1 an 0 (negative slope, inverse) mean as y decreases, x increases
Values of 1 or -1 are perfectly correlated positively or negatively
What values determine the strength of correlation with the Pearson correlation coefficient?
< 0.3 = weak
0.3 -/< 0.5 = moderate
=/> 0.5 = large
What is the notation for the null and alternative hypotheses for correlation tests?
𝐻0: 𝜌 = 0
𝐻𝐴: 𝜌 ≠ 0
What is the df notation for correlation tests?
df = n - 2
What sampling distribution do t-tests use?
t-distribution
What are the assumptions for correlation tests?
Variables X and Y should be scale
Variable X has a linear relationship with variable Y
Both variables should approximate a normal distribution
When do you reject the null hypothesis for correlation tests?
p < alpha
𝜌 ≠ 0
What is the point of simple linear regression?
To examine whether changes in variable X can predict changes in variable Y when X and Y are numerical
Which variables in linear regression are on the x and y axes?
X axis: predictor/exposure/outcome variable
Y axis: outcome/dependent variable
What is the equation for simple linear regression?
Y = mx + b + error
Y: outcome
m: slope
x: predictor
b: y-intercept
What are the parameters for simple linear regression and what are they used for?
- intercept
- slope
Hypothesis testing is done on the parameters of the systematic component
What is the notation for the null and alternative hypothesis for linear regression?
𝐻0: 𝛽1 = 0
𝐻𝐴: 𝛽1 ≠ 0
What are the assumptions for simple linear regression?
Variable type: outcome must be numeric and predictor can be numeric or categorical
Independence: Y values are independent of one another
Linearity: X and Y have a linear relationship
Normal distribution: Residuals of the relationship are normally distributed (Y has normal distribution around the mean of X)
Homoscedasticity: Y variance is equal for any X value
Consider potential outliers