non linear regression Flashcards
linear regression. Describe the relationship among variables
- have linear relationship between dependent and independent variables
- Certain change in DV results in unit increase of IV
- Can draw straight line through data
- if you have multiple IV’s in your model then each of these will have a linear relationship witht he DV. effects should add up - so you have an additive model
non-linear regression. describe the relationship between variables
- The unit increase in IV à unit increase in DV doesn’t apply here
- Cant draw a straight line through data
- your goal is to find the most appropriate non-linear relationship
When should you consider using non-linear regression?
- (1) look at the data - if the relationship between ID and DV looks non-linear then entertain non linear regression
- (2) look at residual plots - more efficient than looking at raw data if you have more than one variable.
- residuals should be scattered randomly.
- if in a cone shape in the residual plot; or a non linear distribution of residuals
- then might want to consider a non-linear model
what kinda shape is in this residual plot?
cone shape
systematic increase in the spread of residuals the higher/lower the value of the predictor.
indicator of hetroscedasticity
what kinda shape is in this residual plot
non linear distribution of residuals
left
- lower levels of the predicted value = overprediction
- middle levels = under prediction
- higher levels = over prediction
- systematic pattern in prediction
what different ways can you run non linear regression in SPSS
- could calculate non-linear predictors by hand
- use non-linear regression routines provided by SPSS
- use curve estimation routines provided by SPSS
these methods can be combined
what does r2 tell us? if its significant what does this tell us?
how much of the variance in Y we can explain using a linear model of X (where x is our linear predictor)
using the linear model of X we explain more variance than what we would have using the mean prediction
well, I ran the linear regression and the p values and stats for the predictor variable looked fine! should I be reassured I picked the right method?
nope!! while it can look fone at the statistics level it might not be at the visual level check the residual plots man!! that will tell you all you need to know
linear regression equation (if we did chose to fit this). what would it be using this data?
Y = c + bX
DV = intercept + (IV * coefficient)
Y = -13.456 + (X * 11.105)
when trying to fit a non liner model what do you need?
- an idea of what the relationship should be*
- should know some of the equations we may want to fit*
- bare in mind many models could work - have reasonable R2 with a reasonable distribution of residuals. not always a definitive CORRECT model*
if multiple non linear models work fine how can we decide which is best?
- go for most simple model - the model with the least predictors/easiest terms
- use cross-validation - check how the model you fit to a data set fits to data in a novel data set
how would you change the equation from linear to non linear
linear regression equation
Y = c + bX
To change it to non linear equation - add X2
Y = c + b1X + b2X2
can do this using transform → compute
non linear regression equation
could try this function on our non linear dataset
Y = c - b1X + b2X2
- b1X = the first coefficient for the linear term
- b2X squared = non linear term, because we are squaring our IV
how do you run non linear regression in SPSS by hand?
transform → compute
- create new predictor, X2
- then run the (normal linear) regression including both linear (X) and non linear (X2) terms
what do you check after adding the nonlinear predictor into the model?
R2, does the non-linear model explain more of the variance in the DV than before, when using a linear model
secondly, check if X2 is significant in the coefficients table. tells us that the non linear term explains a significant amount of variance in DV. depending on the different p values between the linear and non linear term - maybe the non linear one explains MORE of the variance.
check residual plot
why is it better to have the residuals random?
because one of the assumptions of the general linear model is
- you fit the equation by minimizing the sum of squares of the residuals
- such minimization assumes the residuals are normally distributed and there’s no bias
- if that assumption is violated then any values generated are biased
- and GLM regression you want the residuals to be random and ideally normally distributed
- remember what the p value actually means - the probability of having observed this data under the assumption the null is true
- well this null hypothesis includes an assumption about the residuals - assumptions. on what the data is like - the assumption that when you fit the model the residuals are following a normal distribution. no hetroscedescity , no bias in the residuals
- whenever that assumption is violated - your p values, even though you have them, are bias
polynomial - general form of this equation
Y = c + b1X + b2X2 + b3X2
- y is the sum of
- intercept/constant
- plus some coefficient times X
- plus some coefficient times X2
- plus some coefficient times X 3
- etc
this is polynomial. this is the general form of the equation we use to fit to this dataset
quadratic equation.
two positive coefficients.
what would the curve look like?
Y = 2x + x2
Then we get a curve like this:
- u shaped/parabolic
quadratic equation.
if one coefficient positive (B1) and other negative (b2)
what would the curve look like?
Y = 2x - x2
Then we get a curve like this:
- inverted U shape
General form of quadratic equation.
if one coefficient negative (B1) and other positive (b2)
what would the curve look like?
Y = -2X+ X2
Then we get a curve like this:
- U shaped - tipped to the other side
- different to the u shape we get when both were positive