Statistical hypothesis testing Flashcards
What is the purpose of using comparison of groups in research?
- can establish effectiveness if new Rx
- make Dx
- risk factors for disease
- characteristics for disease
What is null hypothesis significance testing (NHST?
= statistical hypothesis testing
comparison of null and alternative hypothesis generated P-value
What is the function of a P-value?
used to ascertain whether sample data provides evidence for difference between the groups in the population
What are the key features of an RCT study design?
Samples taken from a given population (ideal: that this sample is representative)
Parallel study design
Sample groups are RANDOMLY allocated to either intervention/exposure group or to placebo/non-intervention group
Compare treatment efficacy at the end time point between 2 groups
May also do another leg of RCT study where the groups/interventions are swapped
How is the ‘sample mean difference’ in a RCT calculated?
Sample mean difference is an ESTIMATE for POPULATION PARAMETER
calculated either by:
a) intervention - control
or
b) control - intervention
numerical value will be the same for both, just if it is ±
What do statistical hypothesis testing measure?
the extent to which the study sample estimate (i.e. the data) reflects the difference for the relevant (wider) population
measured by the P-value
What questions are we trying to answer by using statistical hypothesis testing?
Does the ‘sample estimate’ support the assumption?
Does ‘sample estimate’ reflect a difference? (Data is used as evidence here)
What is meant by ‘equipose’ as the starting point for statistical hypothesis testing?
equipose = null hypothesis
there is no difference between the 2 sample groups
this is the traditional starting point for statistical hypothesis testing
question we’re trying to answer through stats is does the data/sample estimate go in favour of null hypothesis (H0) or is in in favour of an alternative hypothesis (where there is a difference)
What are the main formal steps in performing statistical hypothesis testing?
LEARN **
- Define statistical null hypothesis and alternative hypothesis
- obtain (representative) sample from the population
- Calculate value of the test statistic (using sample) specific to null hypothesis (using data)
- Use test statistic to derive probability (P-value) that quantified whether null hypothesis should be supported or rejected
- Interpret probability (P-value) and sample data
When are the null and alternative hypotheses defined relative to the study?
Should be be defined prior to collecting sample data
How do statistical and research hypotheses differ?
Statistical hypotheses are very formal
Research hypotheses may result from a pre-conceived idea of a direction or association
(usually informed by prior data). These are usually postulated and inform in experimental design
What is a null hypothesis?
= H(0)
in population where sample taken, there is no difference between the intervention and control groups
in terms of mean data values
What does statistical hypothesis testing usually include?
- extent to which sample estimates supports null hypothesis
- measured by probability (= P-value)
- evidence difference in mean effectiveness (intervention vs. cntrl) in population
What is the alternative hypothesis?
= H(A) In population where sample taken, there is a difference between intervention and control groups such that either Int > Control OR Int < Ctrl
Why is the alternative hypothesis considered to be a 2-sided concept?
there are 2 potential outcomes:
Int > Ctrl
OR
Ctrl > Int
How does P-value inform the decision on whether sample estimate lends support to Null or alternative hypotheses?
P-value is a probability
between 0 < P-value < 1.0
Strength of supporting null hypothesis:
- LIKELY: P-value = 1.0
- UNLIKELY: P-value = 0.0
What is the P-value?
probability of OBTAINING sample difference in mean data point under null hypothesis
i.e. difference in mean data points between groups = 0.00
What is the relationship between the P-value and supporting the null hypothesis?
large P-value (near 1.0) MORE LIKELY to support null hypothesis
small P-value ((near 0.0) LESS LIKELY to support null hypothesis
(^ more likely to support alternative hypothesis)
Why is the P-value conceptually difficult in most study designs?
we usually only have one event, but P-value is in its truest form based on repeated sampling
What is the (widely used)
‘critical level of significance?’
P=0.05 cut-off
i.e. 5% significance
What is the practical caveat of using P=0.05 as the critical level of significance?
It is treated as a dichotomy
e.g. P<0.05 null hypothesis is automatically rejected and the difference in sample data is taken to be as important clinically as well as statistically
this is not always the case
What is the main issue with using statistical hypothesis testing?
People often take statistical significance as a proxy for clinical significance
(rarely one can substitute another. P-value should not be used to assess for clinical significance)
What inference can be made using the P-value?
Statistical significance ONLY
not clinical significance
What is the P-value based on?
- theoretical concept
- using repeated sample from the population
- using the same sample size and conditions for each repeat
- each sample should have a unique mean difference
- there should be variation between sample estimates
What is hypothesis testing based on?
Theoretical concept of REPEATED SAMPLING
How is P-value calculated?
- using statistical test/formula
- the specific test/formula is chosen based on data type and number of variables/groups
- MAGNITUDE determines P-value
What happens to the t-statistic and P-value when the mean difference between groups increases?
t-statistic increases
P-value decreases
(MORE SIGNIFICANT DIFFERENCE)
How is the test statistic determined?
= t statistic
= (mean difference between groups) / SE of that difference
When can a (Student’s or paired) t-test be used?
when measuring a continuous variable between groups
What is the distribution of t test statistic?
similar to ‘normal distribution’
BUT with GREATER PEAK height
At what value of n does the distribution become a ‘normal distribution?’
As n approaches infinity
or from >120
According to the P<0.05 cut-off, what proportion of samples are rejected in the null hypothesis?
if P<0.05 for critical significance
then we are rejecting 5% of samples
i.e. this is 2.5% at each tail extreme of the distribution curve (2x2.5%)
What is the explanation behind rejecting null if sample estimate is in (5%) tails of distribution?
sample estimates found in these extreme value ranges are of a large effect size.
In order for this to exist, it cannot be due to random sampling as in the null hypothesis and must be down to an actual effect
Therefore, in order for the effect to exist, the null must be rejected
Can a hypothesis ever be true or false?
no, we can never say for sure. Hypothesis is a theoretical concept - cannot be definitively measured
Statistical significance DOES NOT EQUAL clinical significance
We can only comment on if there is evidence of a difference between our sample estimates and whether this statistically is in the favour of the null or alternative H
What kind of errors can be made in statistical hypothesis testing?
Type 1 error
Type 2 error
Errors are conceptual, so can never be certain that either error has been made
What is a type 1 error?
Null hypothesis is rejected incorrectly
(i.e. saying that there is a difference but there is not)
Decision is based on sample data
What is a type 2 error?
Null hypothesis is accepted/favoured incorrectly
(i.e. saying that there is no difference but there is)
Decision is based on sample data
What can be done to reduce the probability of making a type I or type II error?
- increase sample size
- improve sampling method
What are the maximum chances of making a type 1 error in a single NHST study?
NHST = null hypothesis significance testing
5% (if critical significance is set to P=0.05)
Why may a type I or type II error be made in a NHST study?
sample may not be representative or appropriate
this creates undue variation not relating to the causal effect
therefore dilutes the statistical effect
What is the risk when making an inference about the population using a sample estimate?
making a sampling error
this causes INTRINSIC VARIABILITY
Which inferences based on statistical; hypothesis testing are made incorrectly?
- statistical inference being taken as biological or clinical significance
- not significant means not important
Which type of probability errors are prevalent in research?
type 1 errors (highly prevalent)
due to inappropriate interpretation of stats etc
What are the 3 concepts in the triad of significance?
- statistical (often based on sample AVERAGE, not individual)
- patient (need to apply to the INDIVIDUAL patient)
- clinical
(types of significance)
What does P-value inherently quantify?
Is a measure of probability
= Strength of evidence to support Null hypothesis
when significance is set at 5%
How should NHSTs be inferred?
only in context of interpreting the statistics
SHOULD NOT be used in interpreting the biological or clinical significance