Statistical Modelling: Correlation + Regression Flashcards
When is correlation used?
When there is no distinction between the two variables i.e. no causation implied
Measures the association between two continuous variables
When is regression used?
When one variable is a response to another variable. The value of the X variable can be used to predict the value of Y variable
What does a correlation coefficient (r) of 0 imply?
No linear relation between two variables
What is the range for a correlation coefficient?
-1 to 1
What does a Pearson’s correlation coefficient of +1 imply?
Perfect positive linear association
Assumptions for hypothesis testing and confidence intervals for population correlation (p)
Both variables are plausibly normally distribute
There is a linear relationship between them
The null hypothesis is that there is no association
Scatter diagram should show a roughly elliptical pattern
What is r^2?
The percentage of variance of one variable explained by the other variable
How can the best fitting line of regression be estimated?
y = a + bx a = intercept b = slope
When can multiple linear regression be used?
To investigate the influence of several explanatory variables simultaneously on the outcome
Why use multiple linear regression analysis?
- To identify any explanatory variables that may be associated with the y variable
- To investigate the extent to which one or more explanatory variables are linearly related to the y variable after adjusting for other related variables
- To predict the value of the y variable from the explanatory x variables
How is the estimated multiple regression variable calculated?
Y = b0 +b1x1 + b2x2 +...bpxp b1 = amount by which y increases on average if we increased x1 by one unit but keep all other xp's constant (or adjust for them) b0 = intercept when all variables are 0