Week 9 Lecture 9 - regressions Flashcards
When is linear regression used?
when looking at the relationship between two variables
best described with a straight line
How does linear regression differ to correlation?
linear regression proposes a model in which you can make estimates from
What are x and y in a linear regresion?
y = variable being predicted (outcome variable)
x = variable used to predict (predictor variable)
What does a linear regression make the assumption of?
y is dependent on x (to some extent)
Does dependency reflect causal dependency?
no just provides evidence for it
What are the 3 stages of a linear regression?
- analyse relationship between variables
- propose a model to explain relationship
- evaluate model
What does stage 1 of a linear regression involve?
view data in scatterplot, view r value
What does stage 2 of a linear regression involve?
the regression line (line of best fit) –> where deviation from data points is smallest
What are the properties of a regression line?
a.) the intercept
b.) the gradient (slope)
What does stage 3 of a linear regression involve?
assessing goodness-of-fit
how much is the simplest model better than the regression model
What does SSt - SSr equal?
SSm
the larger SSm the what?
the bigger the improvement in prediction using the regression model over the simplest model
What do we use to evaluate SSm relative to SSr?
F-test (ANOVA)
What values does an f-test work in?
mean sum values
What does the f-ration provide?
a measure of how much the model has improved the prediction of y (outcome) relative to the level of inaccuracy of the model
If the regression model is a good fit what will we see?
MSm will be large, MSr will be small
F value will be further from 0
What is the null hypothesis in regressions?
regression model and simplest model are equal
What are the assumptions for simple linear regression?
- linearity
- absence of outliers
- normality, linearity, homoscedasticity, independence of residuals
How do you assess normality, linearity, homoscedasticity, independence of residuals?
consider normal p-p plot –> looking for a straight diagonal
consider residual scatterplot –> looking for a rectangle with clusters towards the centre
How can you check whether a data point is an outlier?
check residuals scatterplot
outlier if falls >3.3 or <-3.3
Is there a parametric equivalent for linear regressions?
no
What is r^2 in regressions?
the amount of variance in y explained by the model relative to the total variance in y
can be expressed as a percentage
What is the equation for the regression line in simple linear regression?
y = bx + a
What are multiple regressions?
- assess the influence of several predictors on y
- obtain a measure of how much variance in y the predictor variables combined explain
- obtain measures of how much variance in y the predictor variables explain when considered separately
What is the regression equation for multiple linear regression?
y = b1x1 + b2x2 + b3x3 … + a
What are the 3 stages of a multiple regresson?
1.) analyse relationships
2.) propose a model that is on a plane of best fit
3.) evaluate the model
What are the assumptions for multiple regressions?
- sufficient sample size
- linearity
- absence of outliers
- multicollinearity
- normality, linearity, homoscedasticity, independence of residuals
What are the formulas for determining sufficient sample size?
if considering combined effects only:
N >= 50 + 8 m
if also considering separate effects:
N >= 104 + m
What is multicollinearity?
- x’s correlated with y but not with one another
- check using correlation matrix
- highly correlated x’s (r>.9) can either be combined or eliminated as you are effectively measuring the same thing