Type 1 and 2 error Flashcards

1
Q

Statistical error

A

We can never be completely certain that we are right when we reject or fail to reject the null hypothesis

Type 1 error = rejecting the null hypothesis when it is CORRECT
Saying that means are different when they are the same FALSE POSITIVE

Type 2 error = failing to reject the null hypothesis when it is incorrect
Saying the means are the same when they are different FALSE NEGATIVE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Type 1 error = FALSE POSITIVE

problems

A
  • Overcall positive results
  • Identify a treatment effect when doesn’t exist
  • Waste of time and effort on further development of an ineffective drug
  • Consequences for patients
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Type 2 error = FALSE NEGATIVE

problems

A
  • Overcall negative results
  • Fail to identify a treatment effect when one does exist
  • Reject /lose a potentially effective treatment
  • Waste resources used so far on drug development
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Type 1 error rate

how more likely and how to reduce

A

Rejecting the null hypothesis when It is correct

  • Designated by a (alpha) usually set to 0.05
  • Implying that it is okay to have a 5% probability of incorrectly rejecting a true null hypothesis

Type 1 errors more likely with:
• Multiple tests – if we do 20 tests, one will falsely reject the null hypothesis
• Higher alpha values will cause error also

Reduced by:
• Pre-study analysis design – avoid multiple testing
• Setting a lower e.g., 0.01
• Reporting p values to 3 decimal places to give accurate probability estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Type 2 error rate

how more likely and how to reduce

A

FAILING to reject the null hypothesis when it is incorrect
Denoted by Greek letter B

‘POWER’ of a test is 1-B
• Power = likelihood of a statistical test detecting an effect when there is one
• Greater power = less likely to be a false negative result

Type 2 errors more likely with:
• Small samples
• Small effect size - hard to detect

Prevented by:
• Large sample size
• Larger effect size (choosing an outcome where you can measure better effect size)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Multiplicity

A

Performing many statistical tests on one clinical trial

Increases the risk of type 1 error (alpha)
• False positive result
• Rejecting null hypothesis when it is actually true
• Set for a single comparison at p<0.05

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Risk of type 1 error calculation

A

Calculated by:
[1-(1-a)^n] where n is the number of tests

Type 1 error rate of <0.05 is accepted for a single test
Inappropriate for multiple tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Multiplicity in clinical trials

5 examples

A

Multiple treatment
• More than 2 groups (drugs, doses, combinations)

Multiple endpoints
• Several outcomes of interest

Repeated measurements
• Measurements at multiple time points

Subgroup analyses
• Tests whether individuals with certain characteristics benefit more than those without (e.g. demographics, lifestyle)

Interim analyses
• Analysis of data that is conducted before data collection has been Conducted during the trial e.g. for ethical and economic reasons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Dealing with multiplicity

A
  • Make less comparisons
  • Pre-define/ prioritize the comparisons
  • Adjusting the p value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Make less comparisons - dealing with multiplicity

A
  1. MULTIPLE TREATMENTS use analysis of variance (single omnibus test compares all treatments ar once rather than making multiple comparisons)
  2. MULTIPLE ENDPOINTS
    use single summary statistics e.g. questionnaire many questions but one score
  3. MULTIPLE ENDPINTS use composite endpoint
    e. g. MACE - occurrence of any fatal heart problem like stroke, heart attack etc all counts as 1
  4. REPEATED MEASUREMENTS
    Do analysis at predefined timepoint
    or do summary measure - area under curve for example
    or use statisical mixed model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

pre-define. - dealing with multiplicity

A

Multiple treatments
– Pre-define the most important comparison

Multiple endpoints
– Specify primary and secondary endpoints in advance
• Study is powered to detect primary endpoint and outcome judged on the significance of the primary endpoint
Subgroup analyses
– Predefine a limited number of subgroups to be analyzed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

adjust the p value - dealing with multiplicity

A

Type 1 error rate is inflated by multiple tests
– Reduce the p value threshold for individual tests
– Overall level of significance can be kept at 0.05 for entire series of tests

e.g. Bonferroni correction
– divide 0.05 by number of tests done to set significance level for each subtest
– e.g. for 5 related tests set a (risk of false positive result) at 0.05/5 = 0.01
– Very conservative, tends to overcorrect and increase the risk of a false negative result

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

No significance testing for baseline data

can avoid to reduce multiplicity

A

Multiple tests will generate false positive results
• e.g. 30 comparisons; 79% chance of false positive result

Differences may be clinical important but not statistically significant
• negative tests may be falsely reassuring

Comparisons not testing a useful scientific hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

repeated measurements

A

Outcome variable measured two or more times for each participant over a period of time
e.g. before, during ad after

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to compare repeated measures between groups?

A

Compare final measurement?
• Wastes a lot of valuable information

Compare every timepoint?
• Multiple comparisons – risk type 1 error

Some kind of regression?
• Correlation structure leads to bias

Summary measure approach?
• Each measure has limitations

WE CAN USE REPEATED MEASURES MODEL (ANOVA) or summary measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Tracking

A
  • Baseline characteristics influence PK/PD so that measurement values vary from low to high
  • Values tend to track for an individual e/g/ start high, stay high, start low, stay low
  • There is strong correlation between repeated measures – ‘correlation structure’ hence cant do regression |/
17
Q

• ANOVA (analysis of variance test) is an OMNIBUS TEST

A
  • ONMNIBUS test – tests everything at once (variance of all variables) – avoids risk of multiplicity
  • However, the output just tells us there is a difference – doesn’t tell us what is different (which time points are different??)
18
Q

• A post hoc test

A

can tell us what is different - you can do this by estimated marginal means

  • It is okay for multiplicity for post hoc test because you have already proved there is a difference between the time points as a whole
  • Post hoc is exploratory analysis not your primary outcome
19
Q

• Estimated marginal means

A

an estimate of the means rather than the actual calculation of them
• We can compare them and compare the main effects
• The means are estimated from the regression model rather than calculated from data
• These are inferential stats not descriptive
• Means for groups adjusted for means of other factors in the model
• Also referred to as least square means

20
Q

descriptive stats vs Least square means

A
  • Least squares mean just means means have been estimated from model
  • Primary outcome is NOT significant
  • Why are there no p-values for secondary outcomes? Because primary outcome is not significant – so you dint explore stats on secondary outcome – you don’t give p value
21
Q

Summary measure approach pros

A
  • Summarises all the information as a single statistic
  • Reduced multiplicity
  • Avoids the problem of correlation structure
  • Makes interpretation easier
22
Q

summary measure for repeated time measurements approach examples and limitations

A
  1. mean (central level of efficacy of outcome variable)
    limitation - sensitive to missing info
  2. maximum - (describe max drug concentration)
  3. time to maximum (describe speed of drug)
    limitation - sensitive to missing info
  4. area under curve (assess overall conc of drug) (ignores within subject variations
  5. percentage of time above below certain value (asses time that drug is effective)
  6. number of occasions above or below certain value (assess frequency of fluctuations)
    limitation - many time points needed for stable estimate
  7. rate of change (rate of change in outcome variable)
    limitation - coefficients are measured with varying levels of precision