DATA ANALYSIS 1 Flashcards
What should a hypothesis include?
o What key endpoint, dependent variable, or parameter is being measured and compared between groups?
o What specific groups or experimental conditions are being compared to assess whether a difference exists?
Way to test if the results have happened by chance
They won’t be repeatable
Null hypothesis
states that there is no relationship between the two variable being studies and that any apparent effects are just due to random variation
Alternative hypothesis
states there is a relationship between the two variables being studied and that any apparent effects are NOT due to random variation
if the null hypothesis is true…
If the null hypothesis is false…
there is no real effect
the alternative hypothesis is true
- there is a relationship
- could be that experimental hypothesis is true
- could be biases in experimental design
- may be another reason
E.g. the question: “what effect does treatment with drug A have on the probability that an individual suffers a stroke?”,
Create a null and alternative hypothesis
H0: There is no statistically significant difference between the percentage of individuals
who suffer a stroke following treatment with drug A compared to treatment with
placebo.
H1: There is a statistically significant difference between the percentage of individuals
who suffer a stroke following treatment with drug A compared to treatment with
placebo.
To generate an appropriate hypothesis for a study, the researcher therefore needs to consider what meaningful difference, would provide evidence relevant to the question they are investigating. Specifically:
What endpoint(s) and dependent variables are the researchers interested in comparing between groups (e.g. blood pressure, height, rate of reaction) • how will the endpoint(s) be expressed (e.g. mean, median, maximum, % etc.)? • comparison of which groups’ data is the hypothesis based on? (e.g. drug vs. placebo, males vs. females, treated vs. untreated)
For the question to be investigated and a testable hypothesis generated, specific
details are required. For example:
• What research approach setting are they using (e.g. clinical study, in vivo,
cells)?
• What specific is meant or constitutes “an impact on inflammation”? When? In
what circumstances? How could it be measured? How would the data be
expressed?
• What varying groups or experimental conditions would the researchers need to demonstrate a meaningful difference between, in order to answer their
question? (e.g. between cells treated with PGD2 vs. cells treated with vehicle4)
However, in order to draw meaningful conclusions, the researcher needs to evaluate whether the difference observed between the groups is:
(a) Meaningful – i.e. does the difference constitute an effect-size that is biologically or clinically important7
(b) Genuine – i.e. was the difference observed because the groups are genuinely different, or was it merely the result of natural variation within the data8.
What do significance tests do?
compare the variation with the effect size
calculate an estimate of how often the effect size observed between the groups would occur due to natural/random variation within the data.
What is the p-value?
probability that the effect size is due to random variation and there is no real effect
*tells you about the strength of your data
p=0.001
statistically very strong data
p=0.04-0.02
statistically weak to moderately weak data
p=0.01
moderately strong data
p<0.01
statistically moderate to strong
p=0.05
statistically weak data
p>0.05
weak to very weak data
If p=0.05, the probability that the null hypothesis is true is… and the probability that the alternative hypothesis is true is…
50% the null hypothesis is true
50% the alternative hypothesis is true
Type 1 Error
Type 2 Error
rejecting the null hypothesis when in fact it is true and there is no real effect
accepting null hypothesis when in fact it is false and there is a real effect
Why is null hypothesis significance testing so popular?
People don’t like uncertainty and the NHST appears to give a definite answer (however often wrong due to type 1 and 2 errors)
People don’t like making decisions, therefore the computer makes the decisions for them based on the p-value (however only makes decision off one piece of evidence, the p-value, when really you should make the decision based on all the evidence)
People are often lazy:
- tells you a result is significant, implying that you don’t need to do more experiments (when there is still only weak evidence)
- tells you result is not significant implying that you don’t need to do more experiments (when really the p-value by itself gives no direct evidence that the null hypothesis is true)
- possible to use it without really understanding it
People are ambitious, and the NHST allows you to publish more papers for the minimum work (even if some of the conclusions are actually wrong)