14 | DW-4 | Power Flashcards
Background / p-value interpretation – example with drugs A and B, A seems more efficacious.
What is p-value for?
- P-values are numbers between 0 and 1 that quantify how confident we should be that Drug A is different from Drug B
- The closer the p-value is to 0, the more confident we are that the drugs are different
- It helps us to decide whether to reject the H_0 or not.
- A small p-value does not imply that there is a big difference between A and B, the implication is only for the certainty of whethere there is a difference (small or big!) or not.
Background / p-value interpretation – example with drugs A and B, A seems more efficacious.
What is a commonly used threshold for the p-value? Explain in words what it means for the example.
- 0.05
- If there is no difference between Drug A and Drug B, and if we did this exact experiment a bunch of times, then only 5% of tjese experiments would result in the wrong decision
Background / p-value interpretation – example with drugs A and B, A seems more efficacious.
What would a large p-value mean eg 0.9?
- We fail to see a difference between the two groups
Background / p-value calculation – example with coin that lands on heads twice in a row.
What would the null hypothesis be?
- The coin is not special, it’s no different from a regular coin.
- If we reject H_0, we know that the coin is special.
Background / p-value calculation – example with coin that lands on heads twice in a row.
How are p-values determined? What does that mean for the coin example?
- By adding up the probabilities
- Determine probabilities for the different outcomes if H_0 is true : heads, heads / heads, tails / tails, heads / tails, tails → each one is 0.25 (which means 1 tails and 1 heads is 0.5 because of different orders)
Background / p-value calculation – example with coin that lands on heads twice in a row.
What 3 parts is a p-value composed of? (What would they be for the coin example?)
- The probability random chance would result in the observation. (.25)
- The probability of observing something else that is equally rare. (.25)
- The probability of observing something rarer or more extreme. (0)
- → p-value = 0.5
Background / p-value calculation. You throw a coin 5 times and get 4 heads and 1 tails. Is the coin special? How would you calculate a p-value?
- There are 32 outcomes:
- All heads – 1 way
- 4 heads, 1 tail – 5 ways
- 3 heads, 2 tails – 10 ways
- 2 heads, 3 tails – 10 ways
- 1 heads, 4 tails – 5 ways
- All tails - 1 way
- The probability random chance would result in the observation: 4 heads, 1 tail = 5/32
- The probability of observing something else that is equally rare: 1 heads, 4 tails = 5/32
- The probability of observing something rarer or more extreme: all heads or all tails = 1/32 + 1/32
- → p-value = (5+5+1+1)/32 = 12/32 = 3/8 = 0.375 → not a special coin
Background / normal distribution.
How is the width of the curve defined?
- Standard deviation
Background / normal distribution.
How is the standard deviation useful?
- 95% of the measurements fall between =/- 2 standard deviations around the mean
Background / central limit theorem
What is the central limit theorem all about?
- Means obtained from a distribution (whatever type it is) through random sampling will be normally distributed
- No matter what distribution you sample from, the means obtained will be normally distributed (some exceptions eg Cauchy but not used much)
Background / central limit theorem
What are the practical implications?
- We don’t know what distribution our data comes from, but we know the sample means will be normally distributed →
- We can use the means normal distribution to make confidence intervals, do t-tests , do anova, and prett much any statistical test that uses sample mean
- Rule of thumb: the sample size should be at least 30
Statistical Power (SQ) |
Example: two sets of mice, some on a normal diet and some on a special diet. Their weights have different distributions (though both normal and both same height/width): special diet has a lower mean, there is not much overlap between the distributions. We collect a small sample from each population.
If we collect a small sample of both populations, we would get a ____ p value which would cause us to ….
- Small p-value < 0.05 which would cause us to correctly reject the null hypothesis that both sets of data come from the same distribution.
Statistical Power (SQ) |
Example: two sets of mice, some on a normal diet and some on a special diet. Their weights have different distributions (though both normal and both same height/width): special diet has a lower mean, there is not much overlap between the distributions. We collect a small sample from each population.
What would a large p-value mean? Is this likely to happen?
- If we repeat the experiment a bunch of times, each one should correctly give us a small p-value.
- But every now and then we will get a result that does not make it clear that the populations have different distributions because we will have sampled mice from the overlap.
- → we will get a large p-value
- → we can’t reject the H_0, even though H_0 is false
Statistical Power (SQ) |
Example: two sets of mice, some on a normal diet and some on a special diet. Their weights have different distributions (though both normal and both same height/width): special diet has a lower mean, there is not much overlap between the distributions. We collect a small sample from each population.
What is power? What can you say about the power in this experiment?
- Power is the probability that we will correctly reject the null hypothesis
- Power is the probability that we will correctly get a small p-value
- We have a large amount of power, because we have a high probability of correctly getting a small p-value and being able to (correctly) reject the null hypothesis
Statistical Power (SQ) |
Example: two sets of mice, some on a normal diet and some on a special diet. Their weights have different distributions (though both normal and both same height/width): special diet has a lower mean, there is not much overlap between the distributions. We collect a small sample from each population.
Does the concept of power apply here? When would the concept of power (not) apply?
- It would not apply if the mice were all from one distribution
- There is no such thing as ‘correctly rejecting’ the null hypothesis, because the null hypothesis is true.
Statistical Power (SQ) |
Example: two sets of mice, some on a normal diet and some on a special diet. Their weights have different distributions (though both normal and both same height/width): special diet has a lower mean, there IS much overlap between the distributions, they’re almost the same but not quite. We collect a small sample from each population.
Does power apply here? What can you say about the power?
- It’s more likely to get a higher p-value and not be able to reject the null hypothesis
- Even if we repeat the experiment many times, most of the time we will get a high p-value
- → when there is a lot of overlap between the distributions and we have a small sample size, we have relatively low power.
Statistical Power (SQ) | How can we increase power?
- By increasing the number of measurements we collect
- A power analysis can tell us how many measurements to collect to have a good amount of power.
Background
What problem does power analysis solve? Examples?
- How many samples to we need to test to arrive at a statistically sound conclusion?
- Egs: perform an experiment to test whether or not there is a difference between samples (e.g. drug has an effect, temperature influences gene expression etc.)
- Eg: Hypothesis: there is a significant difference in height between men and women in the German population. How many men and women do we need to sample? → Answering this question is said to “power an analysis”
What is Power?
What is power? Basic formula - Power = 1 - β
What is power?
Power = the probability of rejecting __________ if it is ______.
- The null hypothesis if it is false
What is power?
Power = probability of __________________ when the null hypothesis is false.
- Making a correct decision
What is power?
Power = probability that a test of significance will __________________that is present.
- Pick up an effect
What is power?
Power = probability that a test of significance will …
- detect a deviation from the null hypothesis, should such a deviation exist.
What is power?
Power = probability of avoiding …
- a Type II error.
Power analysis
Power analysis (SQ) |
Example: 2 drugs A and B. We sample and see that people who take drug A seem to recover faster, but the p-value is 0.06 → we can’t reject the null hypothesis that the populations come from the same distribution. How could we use power analysis here? What would the alternative be?
- We could do a power analysis to determine what sample size will ensure a high probability that we correctly reject the null hypothesis that there is no difference between the two groups → we’ll know that regardless of the p-value, we used enough data to make a good decision.
- The alternative would be to keep sampling until we get a lower p-value → this is wrong! P-hacking.
Power analysis (SQ) | What are the two main factors that affect power?
- How much overlap there is between the two distributions we want to identify with our study.
- The sample size, the number of samples we collect from each group