2. Aug 22nd Flashcards
The Insignificance of Statistical Significance Testing
Most stats classes focus on how you calculate a p-value in different contexts.
Johnson first points out the limitations of p-values.
Larger issue: How we really SHOULD be doing statistics
What is a p-value?
The probability of getting the observed data/results or something MORE extreme given the null hypothesis is TRUE.
- In ecology, we’re frequently studying relationships (ex. do supplements effect antler size, does adding vegetation to a stream bank reduce pollution in a river)
- Looking at a natural variable in the environment (x) and measuring a response to that change (y)
- We’re almost ALWAYS looking at relationships
- Null hypothesis is that there ISN’T a relationship
- Ex: Density of predators vs density of prey: Even if there ISN’T really a relationship, you will still get inherent noise (from process & sampling error), indicating SOME sort of relationship
Johnson’s beef with p-values
- In most ecology cases, we KNOW there is SOME relationship, so it’s useless to ask if there is one (p-value)
- The chances of getting a STRONG relationship are definitely slim
- If your observed relationship is really unlikely, you might conclude that there is a relationship
What p-values DON’T mean
- The probability that your observed results were due to chance
- – Why? Your results are either due to chance or not. There’s no probability associated with it.
- – Flipping a coin: once the outcome has happened, once truth exists, there’s no probability of it.
- Probability that the null hypothesis is true
—Technically speaking, p-values say nothing about your data
- – It’s really talking about data you haven’t collected
- —– What is the probability of getting similar results given your results?
What 2 things are p-values always based upon (if not more)?
The probability of getting your results is ALWAYS the function of AT LEAST two things
- –1. Effect size - your results (slope, difference between groups). The STRENGTH of the relationship between x & y
- –2. Sample size - More samples ALWAYS equals smaller p-values
What-ly significant vs what-ly significant?
Statistically significant vs biologically significant
Must it be possible that the null hypothesis actually be true?
YES! Otherwise it’s irrelevant, statistically speaking.
ALWAYS REPORT what in your study?
- Effect size
- Confidence interval (or standard error) - some measure of precision
These tell you
- Was the effect large or small?
- How much precision do we have in that estimate?
Reasons for a large (non-significant) p-value (2 of them)
Lets say you get a large P-value, 0.8. We’d want to be careful about drawing a relationship between X & Y.
- You have a small sample size
- Small effect size
You can then say “My large P value seems to be due to small sample size.”
(I think power analysis is bogus. Only good for grant-writing purposes. After you’ve collected data, power analysis is worthless)
EXAMPLE
Rabbits have horrendous survival rates, squirrels are the opposite.
- We studied if method of hunting (sneak attack vs coursing predator) had an effect
- The journal loved it, said it was the best one all year
- We had a non-significant p-value, but large effect size
- Just because it’s not statistically significant doesn’t mean it’s not biologically significant
What if you get a very small p-value (0.0001)? Two reasons
- Large sample size
- Large effect size
It is possible to get statistically significant result that, biologically speaking, no one cares.
EXAMPLE
Real drug that’s on the market. Given to patients after 2nd heart attack (he’s using contingency table, chi-squared table). Tracked who had a heart attack in coming year.
- With drug 63%, not 67%
- Chi-squared 42.2
- P = 9.02 e^-11 (9 with 11 zeroes in front)
- 4% difference. It’s not even compared to walking every day or other methods that don’t cost exorbitant amounts daily.
Statistics is NOT objective. It cannot be.
Purpose of statistics is NOT…
To tell you what’s going on in a system. That’s not science. It is not meant to tell you the answer.
The purpose IS to validate what you THINK is happening in a system. It is for validating hypothesis.
- Every time you test, you should have a hypothesis.
- It should have scientific and biological reasons to include variables in data.
- DO NOT use data JUST because you have it/collected it.
- We collect TONS of data, but only use a small fraction for published studies/results.
- We don’t use tons of data because we didn’t have a BIOLOGICAL reason to use it.
Gotten in arguments with fellow professors over…
Some will say “You CAN’T/SHOULDN’T report the effect size if you got an insignificant p-value”. Wrong. You report the effect size, confidence interval, and then say “The effect we discovered was not statistically significant.”
Worst thing you can do in a scientific paper “We found no significant effect, p value = 0.1.”
Is a p-value of 0.051 really non-significant while a p-value of .049 is?
Where did 0.05 come from? Sir Ronald Fisher, inventor of the ANOVA (before hypothesis). Someone came up to him after a talk on the ANOVA, and asked “At what p-value would you conclude there was a significant effect?” and his answer was “Dunno, 1/20?”
It has been researched since, and 0.05 is a good cut off, but not
I ask this at PhD dissertation defenses. Brainteaser
2 scenarios
- P-value of .05 with sample size of 30
- P-value of .05 with sample size of 150
Which result would you rather have?
- Most people want the latter result (the sample size is bigger).
- But he prefers the first.
- Since the p-value is the same, but the sample size is larger, the smaller sample size one must have a larger effect.
- “I found a significant result, but I can’t report it because the sample size is small.” No!!! Just because you have a small sample size doesn’t mean you don’t have good science/results.
Is this specific to biology or more broad?
Broad.
If I flip the coin 10 times and get 6 heads, is that an unlikely outcome? No, it’s pretty common.
If I flip the coin 100 times and get 60 heads, that is far less likely (p-value is smaller).
Both cases is 60%, but because you flipped more, your probability of getting that in #2 varies different. Sample size effects the output!