Chapters 36 and 37-Beyond Simple Linear Models Flashcards
What does nonlinear regression not mean?
Nonlinear regression does not simply mean that the graph of X vs Y will be curved.
What does nonlinear regression mean?
Nonlinear regression means that Y is not linear with respect to the parameters.
What is an example of nonlinear regression?
The Michaelis-Menten equation for enzyme activity
For what should nonlinear models should usually be chosen for?
Nonlinear models should usually be chosen for theoretical reasons, not just by trying to fit a curved line equation to the data.
What kind of equation is rarely appropriate for nonlinear regression?
Polynomial equations are rarely appropriate
How would polynomial equations possibly be useful?
They might be useful for limited interpolation. These equations also make programs like MS Excel easier to fit data than truly nonlinear models.
Why are polynomial equations still considered linear?
Because Y is linear with respect to the parameters.
Finding the best fit parameters for a nonlinear equation is usually done how?
By many computer programs
Like linear regression, what does nonlinear regression find?
Finds parameter values that minimize the sum of squares of the difference between actual and predicted Y values
What does multiple regression do?
It extends linear regression to allow multiple independent (X) variables.
What is an example of multiple regression?
Distinguish between effects of lead exposure vs age on kidney function
Usually, multiple regression is considered distinct from _______.
multivariate analyses
Multivariate usually means…
that there are multiple Y variables
Example of multivariate?
How does soil texture affect quantities of all plant species in a community?
What is a Regression Coefficient?
A parameter explaining the relationship of an independent variable to the dependent variable
In simple linear regression, _______ is a regression coefficient.
slope
In multiple linear regression, how many regression coefficients are there?
Two or more
What do regression coefficients quantify?
Quantifies the relative impact of each independent variable on the dependent variable: ie the amount of change in Y for each unit of change in X
Why is the regression coefficients chosen?
To minimize the sum of squares
Dummy variable
Allows a discrete variable to be used. Presence value=1
Absence value=0
What does the P value mean for regression coefficient?
P values test the null hypothesis that the coefficient is 0.
What is a problem of finding the P value for a regression coefficient?
R2 goes up automatically when the number of independent variables increase.
Adjusted R2
The adjusted R2
is always smaller and depends
on the # of X variables (decrease with) and #
of samples (increases with).
Assumptions of multiple linear regression
Independent observations from one population
• The random component is Gaussian
• Linear relationships only (doesn’t guarantee a linear
relationship, but doesn’t detect other relationships)
• No interaction beyond that specified in the model
(e.g. assuming that the effect of lead is not greater
when age is greater), but interaction terms can be
created
What if you don’t know which factors are important for
causing your dependent variable observations?
Could use an automatic selection process to choose
independent variables (from many for which you have
observations) that significantly improved your ability to
predict the Y values.
e.g. forward-stepwise selection:
start with a very
simple model and use a computer to choose the X
variable that most improves the prediction.
• If significant keep it and choose another X
variable that best improves the prediction further
– until you reach a point where the best
additional variable cannot provide significant
improvement.
• Note: you just performed multiple comparisons
How is automatic selection useful?
It may detect relationships that were not already obvious
How should you approach using information from automatic selection?
because this involves multiple comparisons, the
R
2
, CIs, and P values cannot be trusted (if reported).
Therefore, automatic selection must be…
considered an exploratory
analysis to generate the model (inductive process).
• If the model that you generate results in X variables
that make sense scientifically, then you could do a
completely independent analysis to test if those
variables predict Y significantly.
Adding X variables will automatically
increase the R2
• Sample size needs to be much larger than the
number of X variables (10-40 times greater).
Multicollinearity:
many X variables are strongly
correlated; adding correlated X variables will not
help predict Y better; avoid entering correlated Xs.
Interaction:
may need to account for instances
when the effect of a particular X depends on the
value of another X. Use an interaction term.