non linear regression Flashcards

1
Q

linear regression. Describe the relationship among variables

A
  • have linear relationship between dependent and independent variables
  • Certain change in DV results in unit increase of IV
  • Can draw straight line through data
  • if you have multiple IV’s in your model then each of these will have a linear relationship witht he DV. effects should add up - so you have an additive model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

non-linear regression. describe the relationship between variables

A
  • The unit increase in IV à unit increase in DV doesn’t apply here
  • Cant draw a straight line through data
  • your goal is to find the most appropriate non-linear relationship
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When should you consider using non-linear regression?

A
  • (1) look at the data - if the relationship between ID and DV looks non-linear then entertain non linear regression
  • (2) look at residual plots - more efficient than looking at raw data if you have more than one variable.
  • residuals should be scattered randomly.
  • if in a cone shape in the residual plot; or a non linear distribution of residuals
  • then might want to consider a non-linear model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what kinda shape is in this residual plot?

A

cone shape

systematic increase in the spread of residuals the higher/lower the value of the predictor.

indicator of hetroscedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what kinda shape is in this residual plot

A

non linear distribution of residuals

left

  • lower levels of the predicted value = overprediction
  • middle levels = under prediction
  • higher levels = over prediction
  • systematic pattern in prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what different ways can you run non linear regression in SPSS

A
  • could calculate non-linear predictors by hand
  • use non-linear regression routines provided by SPSS
  • use curve estimation routines provided by SPSS

these methods can be combined

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what does r2 tell us? if its significant what does this tell us?

A

how much of the variance in Y we can explain using a linear model of X (where x is our linear predictor)

using the linear model of X we explain more variance than what we would have using the mean prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

well, I ran the linear regression and the p values and stats for the predictor variable looked fine! should I be reassured I picked the right method?

A

nope!! while it can look fone at the statistics level it might not be at the visual level check the residual plots man!! that will tell you all you need to know

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

linear regression equation (if we did chose to fit this). what would it be using this data?

A

Y = c + bX

DV = intercept + (IV * coefficient)

Y = -13.456 + (X * 11.105)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

when trying to fit a non liner model what do you need?

A
  • an idea of what the relationship should be*
  • should know some of the equations we may want to fit*
  • bare in mind many models could work - have reasonable R2 with a reasonable distribution of residuals. not always a definitive CORRECT model*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

if multiple non linear models work fine how can we decide which is best?

A
  • go for most simple model - the model with the least predictors/easiest terms
  • use cross-validation - check how the model you fit to a data set fits to data in a novel data set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how would you change the equation from linear to non linear

A

linear regression equation

Y = c + bX

To change it to non linear equation - add X2

Y = c + b1X + b2X2

can do this using transform → compute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

non linear regression equation

A

could try this function on our non linear dataset

Y = c - b1X + b2X2

  • b1X = the first coefficient for the linear term
  • b2X squared = non linear term, because we are squaring our IV
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how do you run non linear regression in SPSS by hand?

A

transform → compute

  • create new predictor, X2
  • then run the (normal linear) regression including both linear (X) and non linear (X2) terms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what do you check after adding the nonlinear predictor into the model?

A

R2, does the non-linear model explain more of the variance in the DV than before, when using a linear model

secondly, check if X2 is significant in the coefficients table. tells us that the non linear term explains a significant amount of variance in DV. depending on the different p values between the linear and non linear term - maybe the non linear one explains MORE of the variance.

check residual plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

why is it better to have the residuals random?

A

because one of the assumptions of the general linear model is

  • you fit the equation by minimizing the sum of squares of the residuals
  • such minimization assumes the residuals are normally distributed and there’s no bias
  • if that assumption is violated then any values generated are biased
  • and GLM regression you want the residuals to be random and ideally normally distributed
  • remember what the p value actually means - the probability of having observed this data under the assumption the null is true
  • well this null hypothesis includes an assumption about the residuals - assumptions. on what the data is like - the assumption that when you fit the model the residuals are following a normal distribution. no hetroscedescity , no bias in the residuals
  • whenever that assumption is violated - your p values, even though you have them, are bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

polynomial - general form of this equation

A

Y = c + b1X + b2X2 + b3X2

  • y is the sum of
  • intercept/constant
  • plus some coefficient times X
  • plus some coefficient times X2
  • plus some coefficient times X 3
  • etc

this is polynomial. this is the general form of the equation we use to fit to this dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

quadratic equation.

two positive coefficients.

what would the curve look like?

A

Y = 2x + x2

Then we get a curve like this:

  • u shaped/parabolic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

quadratic equation.

if one coefficient positive (B1) and other negative (b2)

what would the curve look like?

A

Y = 2x - x2

Then we get a curve like this:

  • inverted U shape
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

General form of quadratic equation.

if one coefficient negative (B1) and other positive (b2)

what would the curve look like?

A

Y = -2X+ X2

Then we get a curve like this:

  • U shaped - tipped to the other side
  • different to the u shape we get when both were positive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what do we mean by a quadratic equation?

A

where the maximum power of X is 2

22
Q

general form of cubic equation

b1, b2, b3 all positive

A

adding cubic term to previously quadratic equation X3

Y = c + b1x + b2x2+ b3x3

adds another bend to the shape, now instead of a U have an S shape

23
Q

Cubic equation

b1 positive, b2 positive, b3 negative

A

Y = c + b1x + b2x2 - b3x3

24
Q

where to start when considering non linear regression

A

First plot the data - look at how many bends there are.

  • 1 bend → consider quadratic equation, x2
  • 2 bends → consider a cubic equation, x3
  • 3 bends → there will be a x4 term in the equation
25
Q

what are polynomials

A

the Y = b1x + b2x2

Extended to the full range so could have x3 x4 etc

26
Q

whats the point in the equations?

A

we want to generate an equation that would help us predict something.

e.g., predict reading scores at age 8 based on linguistic abilities at age 5

27
Q

okay i have my data set and i think i will use non linear regression. how can i test a non linear model was better than linear?

A

Run linear model, look at R2. then run non linear model and see which explains more of the variance

look at the coefficients table - see if (as well as the linear contribution) the quadratic contribution is significant. Does it significantly explain any of the variance in DV?

28
Q

if we run linear regression model. andthe coefficient of X is significant, what does this tell us?

A

That the level of prediction is significantly better than just using the mean score to e.g., predict how well children performa at age 8

29
Q

lets say the quadratic equation fits the data nicely. Should we just stop there?

A

no, run a cubic equation against the data to see if this fits better. looking at the R2, residual plots etc

but in general, the simpler the equation the better

30
Q

the higher you go on your polynomial the…

A

The better the fit and the residuals become more random/normal

but there is the danger of overfitting

every predictor you add, takes away degree of freedom for testing significance so you’re also losing power

best to keep it simple - stop when the residuals meet the assumption (randomly distributed)

31
Q

How does the non linear regression procedure in SPSS fit the function differently to if you went through the standard linear route

A

when you use the standard - you maximise the sums of squares of residuals in one go

when use the procedure - SPSS uses the maximum likelihood procedure. still works iwht the sums of squares of the residuals - so in the end you get the same results - but it does this in a sort of iterative procedure

32
Q

when is an iterative procedure to getting the maximum sums of squares of residuals better?

A

When you have a large data sets, multiple predictors or very complex non-linear models

33
Q

how do we run a non linear regression by hand

A

transform → compute add in the quadratic, cubed and quartic terms as new coefficients

then run a standard linear regression model including them

34
Q

how do we run a non-linear regression quation using the non linear regression procedure in SPSS

A

set parameters - name and starting value

model expression

tell SPSS this is the model we want to fit - e.g., quadratic model

35
Q

What do double astricks followed by 2 mean in SPSS?

Squared

A

Squared

36
Q

Loads of output is generated with non linear regression procedure in SPSS – what of this is relevant

A
  • (1) ANOVA output – specifically look at the sums of squares for the residual and R2 value below it
  • Not this doesn’t give us p values and all that. Just the mean square for different elements (regression MS and residual MS)
  • (2) Parameter estimates
37
Q

Non linear regression procedure in SPSS – don’t get p values – how can we then determine significance?

A

Parameter estimates table

  • Give you confidence intervals
  • Check these don’t pass 0
  • If none pass 0 then keep them in the regression equation
38
Q

In the parameters estimate table of the non linear regression procedure in SPSS what are the “estimates”

A
  • the coefficients for your parameters – for the constant/intercept, b1, and b2
  • b1 basically the linear parameter and b2 the non-linear (squared in this case)
39
Q

what do you need to know before doing no linear regression?

A
  • need to have some idea of the equation you want to fit – whether its calculated by hand or through the SPSS procedure
  • the non linear regression procedure – needs starting values. But where you start might determine where you end up
  • as you add higher components to the equation – increases R2 but will produce some “curious” results. Overfitting.
40
Q

If we add a cubic term and it turns all the predictors insignificant what does this tell us?

A

That having added this third term it changed how the variance is accounted for by the. Different components

41
Q

How do we know which model is best in non linear regression?

A
  • Depends on your theory/hypothesis – might do research in an area where ppl used Non Linear Models before. The NLM you explore might be in agreement with those so just use those
  • Model fit – R2, coefficients, data plots, residual plots

Find the model that best explains the relationship between your IV and DV’s. Typically a process of refinement or comparison between/across models till you find the jackpot

42
Q

what if you had multiple independent variables – how would this affect say the quadratic equation

A

can get quite complex. We can have power terms for not only the non linear term for all IV’s but also the interaction term. Modelling such forms can get quite complex.

43
Q

Whats useful when determining which equation. touse

A

Might be difficult to know what to pick – previous research v helpful here.

  • Even if there isn’t a functional form you can grab from previous research
  • You can use a model/theory that makes quite specific claims about the functional forms that will help you limit your options as well
44
Q

what are polynomials

A

powers of X or combinations (interactions) of them.

45
Q

what are some other functions (not polynomial)

A
46
Q

How would you chose appropriate starting values

A
  • Based on research hypothesis and/or previous research
  • Can run the analyses multiple times at different starting values if youre worried about this
47
Q

When would you use curve estimation

A

More explorative approach

  • If we don’t know how two variables are related, curve estimation is a good procedure to explore the data
  • You can just select lots of different functions and run it
  • All automated, don’t have to give it any starting values – useful to just explore the relationship
  • As well as the functions gives you the option to plot the data. Can plot each of the functions and evaluate what this looks like in comparison to the raw data
48
Q

With curve estimation let’s say we pick the two best functions – how then can we decide which is best

A
  • Could pick the highest R2
  • Look at the residual plots, regression coefficients etc

can just ask it to give us all of these – remember the approach is quite automated

if both look okay then the question is no longer a statistical but conceptual one à which is more relative to your experiment, the theory behind it and background research

cross validation may also be useful here – if you have nothing conceptual to rely on (e.g., the theory/background research) then use this

49
Q

written write up of results – things to comment on:

A

Multicollinearity bit

  • If you have multiple IV’s might want to comment on the relationship between these

Any evidence of outliers

  • Good way to spot this – look at residual plots

If evaluating multiple models – compare them and make a selection

Then conclude final model and summary

50
Q

What are the there ways to run non-linear regression in SPSS

A
  • Create your coefficients by hand and run it through linear regression menu
  • Use the non-linear regression procedure – specify model and pick parameters
  • Use curve estimation – more explorative method