Data Analysis I Flashcards
Give 3 factors that determine how much weight we should place on the results to a particular study.
- How well characterised are the reagents/equipment/methods?
- How well is the experiment designed - good controls, possible biases?
- How many times has the result been succesfully reproduced? (statistical confidence)
What is the main reason for calculating a mean of multiple observations?
- Random errors tend to cancel each other out when a mean is taken
- Any effect we observe is less likely to be random error and more likely to be real
Describe 3 approaches used to assess the reliability of experimental results.
- Visual approach - s.e.m. error bars
- Numerical approach - p values and statistical significance
- Quantitative approach - confidence intervals
In clinical trials, what is the current preferred approach for assessing whether an effect seen in a study is real or due to chance?
Confidence intervals
Explain the meaning of the 0-1 p-value scale.
- 1 means the data looks random
- 0 means the data does not look random
Briefly outline how we should interpret p-values of above and below 0.05.
- P > 0.05 - result is statistically unreliable
- P = 0.01-0.05 - effect is worth considering but may still be due to random chance
- P = 0.01 or less - fairly convincing effect - it seems unlikely that this is due to random error
What is the disadvantage of using p-values to assess the significance of an effect?
- Doesn’t tell you effect size
- Doesn’t tell you confidence intervals
P-values should only be used in combination with visual data and/or confidence intervals
Give functional definitions for “statistically significant” and “not statistically significant”.
- Statistically significant - worth considering; more likely to be a real effect than random error
- Not statistically significant - result is statistically unreliable; it may be real, but may well just be due to random error
Statistically significant does not mean biologically significant!
Biological significance is determined by what?
The size of the effect, i.e. the difference between the experiment and the control.
This is quantified by confidence intervals and can be seen in error bars.
Give the advantages and disadvantages of using statistical significance.
Advantages:
- Useful in quality control applications
Disadvantages:
- Very misleading
- Statistically signficant results may still be random noise - null hypothesis is correct
- Effects that are not statistically significant may still be real
- Statistical significance does not prove biological significance
Which 2 tests are most commonly used to determine whether the difference between means is due to random error?
Student’s t-test, Mann Whitney U-test
When is it appropriate to use the student’s t-test?
When the data is normally distributed. Slight variations from normal distribution are not a problem, but highly skewed data should be assessed by a different test.
When should the Mann Whitney U-test be used?
If the data definitely isn’t normally distributed. This test is “non-parametric”.