8. Sept 12 Flashcards

Question 1

Q

Cottingham et Al article

Answer

A

Should have read it

Why read it?
She does a really good job of laying out why ANOVA, t-tests, and regressions are all the same mathematically.

Question 2

Q

Today: x as a continuous variable

Answer

A

Special case where x variable truly is a continuous variable.

Normally we have wetlands, agriculture as the X (environment)
But when data is collected at specific values
– AND when it’s been replicated at those values (perhaps most important)

Ex. number of cows in a field
Interested in biomass as the number of cows increases
- Some fields have no cows, 2 cows, 4 cows, 5 cows, etc

This also could be analyzed as an ANOVA (treating number of cows as categorical)

COMPARE THAT
Where x values and y values are collected a true continuum of x values
- Cannot be/shouldn’t be analyzed as an ANOVA

Question 3

Q

Replicated regression

Answer

A

Because if it truly is a continuous x, you have a distinct grouping in those numbers, and it’s replicated.

Pros and Cons of analyzing as categorical x (ANOVA) or continuous x (regression)

Cat: I want Y at specific values
Cat: Want to know specific differences (male vs female)
Cont: No post-hoc tests
Cont: Can interpolate (cannot interpolate with ANOVA)
Cont: More power (one of the biggest pros)
– Better ability to detect a significant relationship
—- Because it eats up fewer degrees of freedom
—- Every parameter you have in your linear model eats up a sample size. Thus you lose power with more complicated models
—- [Equations] Can use either continuous model (only 2 parameters, y and slope), or categorical (has more parameters, up to 5 more with 5 cows). More complicated = less powerful
Cat: one of biggest: it makes no assumptions of linearity (though all other 4 assumptions ARE made) - it makes no assumption about the shape of the relationship AT ALL
– A distinct advantage
– And I forgot: let’s go back to MORE POWER
—- I said regression will always have more power
—- This is not technically true
– It compares differences between groups
—— ANOVA will only have more power IF and ONLY if the assumption on linearity is met.
——– Then you WOULD have more power modeling x as categorical

Question 4

Q

Which one would you want to report your results on for your lab today?

Answer

A

Continuous!!!

Because if we choose categorical, we have WAY more parameters.

Because then we run a post-hoc test if we’re being REALLY thorough.
The “stock sentence” we mentioned earlier must have SEVERAL things added to it.

In many ways, when you treat x as continuous it’s easier to explain the results.

Question 5

Q

Cows & Biomass Illustration of ANOVA being more powerful

Answer

A

Only true because the assumption of linearity is violated

Generally I argue that if you have a continuous x, it is almost always best to treat it as a continuous x.

– This is basically a simpler model
—- So, always treat x as continuous UNLESS there is statistically significant evidence that the assumption of linearity has been violated

Since you never REALLY know if it’s linear (because reality and all that), you can TREAT it as linear until statistically significant evidence of non-linearity

Question 6

Q

How do you know if there’s statistically significant evidence of non-linearity, and what does that mean?

Answer

A

If treating x as categorical significantly improves the fit of the model to the data
- And this only happens when it gets VERY non-linear

A significant thing about models
Axiom/Rule: Every parameter added to a linear model (every Beta, every X), improves the fit, and decreases the error/noise, and increases R-squared.
— And that’s true even though that parameter doesn’t have the ability to explain anything
—– Back to basic geometry: two points? Easy, get a line.
——- All you ever have to do to fit more points is to add parameters (even if they have no meaning whatsoever)
- So we ONLY want to add parameters to a model if they SIGNIFICANTLY improve the fit
— What do you mean by parameters? Another X.

How does continuous make sense? You can’t have half a cow?
— It doesn’t MATTER. You can TREAT it as such, never calculate the half value, and it still works.

Question 7

Q

Next question, how do we determine if a more complicated model is a significant improvement in the fit of the data?

Answer

A

We don’t JUST want to look at the significance of the extra parameters.

We do a test, an f-drop test.

A couple of years ago, someone told me they couldn’t find it, but I can’t find the other name for it.
– It’s a marginal p-value, some call it a type-3 sum of squares p-values,
– My stat professor called it an f-drop test is because is measures the change in sum of squared error using a test statistic
—- This measures HOW MUCH the sum of squares decreases when adding the parameter

The math of it

You have this f-distribution, which looks like this [x]. This is zero [?].
– We only call it significant when you’re out here in the 5% of the tail.

Question 8

Q

Back to our experimental data

Answer

A

Number of cows

True error is = 0.5
Biomass is linear
Beta zero -6
Beta 1 -0.8

Get this data into r

Question 9

Q

First thing I always plot it

Answer

A

plot(Biomass~Cows, data=datum)
results=lm(Biomass~Cows,data=datum)
summary(results)

Why isn’t it running as box and whiskers plots?
— Because it’s treating it as continuous (even if I don’t want it to)

Question 10

Q

Wait a minute, there’s an assumption of linearity. How do I know this is truly linear? Maybe I want to treat this as categorical as that would account for non-linearity.

Answer

A

How do we treat cows as categorical?

as.factor(Cows)
results2=lm(Biomass~as.factor(Cows),data=datum)
– Forces R to treat this as … trail off

REMEMBER

Compare the categorical vs continuous
– You’ll ALWAYS see R-squared go up when you add more parameters, so do NOT treat that as a significant difference

Question 11

Q

So we improved the fit. Is this a SIGNIFICANT improvement of the fit

Answer

A

The f-drop test:

anova(results2,results)

Always list the more complicated results FIRST
Gives you an ANOVA table
– The p-value is a test of the null hypothesis that the simpler model is an adequate fit
—- So if p > 0.05, then null hypothesis is good (i.e. continuous is fine)
—- And if p < 0.05, then the non-linear/more-complicated is significantly non-linear

Generally, it’s a comparison between a more complex model and a simpler model.
- This one just happens to be non-linear vs linear

Question 12

Q

There is a third option (I’m surprised no one brought it up)

Answer

A

x as continuous
x as categorical
** x as continuous but non-linear
There’s a really good, easy way to capture non-linearity
The quadratic!
– Most things in ecology probably don’t follow quadratics.
– They probably do follow weird non-linears (mckala’s minton)

Question 13

Q

Doing the quadratic in R

Answer

A

results3=lm(Biomass~Cows+I(Cows^2),data=datum)

The I means “actually do whatever is in the parentheses”.
Does the math for the parentheses and returns it as its own X value.

COMPARE
anova(results3,results)
- Comparing quadratic to linear
- Quadratic is significant improvement

anova(results2,results3)

Comparing categorical to quadratic
– Quadratic is NOT a significant improvement in fit
What would it mean for relationship between biomass and cows if this HAD been significant
– It would mean that it is continuous, that is is non-linear, but it’s not quadratic

Question 14

Q

Big points from today

Answer

A

Often times you can treat X as either categorical or continuous
– He argues you should always treat it as continuous because of the advantages in doing so (more power, simpler) UNLESS there is significant improvement in treating it as categorical
– Do so with an f-drop test