Week 1 - from association to modelling causlity Flashcards
1
Q
what are the benefits of regression
A
- allows you to test hypotheses using the null hypothesis significance testing framework as each model gives associated p value
- flexibility
- can model secondary variables with ease (extraneous/nuisance)
*no need for post-hoc testing - has higher statistical power
- generates predictions
- easily extends into other types of forms
- potential to mix categorical type variables with continuous variables
2
Q
what is correlation to simple regression
A
- is a decision to make one of the variables predict the other variables
- now talking about predictor and outcome variables
- line of best fit is now a regression line
3
Q
what are the parts of a regression line
A
- has an intercept - average in the outcome variable
- slope - line of best fit
- m = beta weights or coefficients
- slope tells you the rate of change in y for a one unit change in x
- simple regression is very similar to correlation when in venn diagram form
4
Q
what is the equation for the regression line
A
y = b0 + b1 * x1 + e
* outcome variable
* intercept
* beta coefficient
(slope term)
* predictor variable
* residual error
5
Q
what are the predictions from simple regression
A
y = b0 + b1 * x1
* predictions from either values in data set, observed values or values not in sample
6
Q
what are the assumptions of simple regression
A
- linearity
- independence of observations
- homoscedasticity of residuals
- normally distributed residuals
7
Q
how do you interpret the output of simple regression
A
- intercept term - average when all continuous predictors are at 0 or categorical predictors are at their reference level
- continuous coefficients - one unit increase in X gives a change in Y by the amount of B
- categorical coefficients - change to another level or group within X gives a change in Y by the amount of B
8
Q
what is multiple regression
A
- more predictor variables
- still one intercept
- still one error term
- can still have predictors that show no or weak relationship
*
9
Q
what is the equation for multiple regression
A
Y = b0 * x1 + b2 * X2 + e