3 - Interpreting the Results Flashcards

1
Q

check out the error chart

A
  • truth means the truth of the universe
  • we don’t know this for sure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is power (p-value)?

A
  • p = 1-Beta (ie 80%) - lower right in chart
  • the probability we have of making the correct decision that there is a difference when in fact a difference occured
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the p-value?

A
  • p-value means there is a statistical difference, reject null (generally if p <0.05) - NOT whether this difference is significant
  • OR the probability of the observed result arising by chance
  • people think this means there is an important finding, wrong
  • a p value less than 0.05 is good, basically this is the probability that you will incorrectly conclude that the null hypothesis is false (make T2 error - ie B)
  • tells you nothing about the probability of replication (reproducability) or magnature of an effect
  • do not put too much weight on p-value (highly dependant on small n sizes - bc random sampling error can occur!)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the worst type of error?

A

Type 1 - false positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a type 1 error vs type 2 error

A

1 - alpha - false positive (reject null even though null is true)

2 - beta - false negatve (fail to reject null even though reject null is true)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is null-hypothesis significance testing?

A
  • the idea of presenting us with a p-value in the article
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is confirmation bias?

A
  • we see a result just bc we are looking for it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

name/describe the 4 types of data, most often seen = *

A
  1. nominal: discrete categories w no order, dichotomous and categorical (y/n, dead/alive)
  2. ordinal: ordered categories w difference btw categories not assumed to be equal, categorical (mild, moderate, severe)
  3. interval: equal distances btw values and 0 is arbitrary, continuous (IQ)
  4. *ratio: equal intervals and meaningful q, continuous (height, rom, weight)
    - ratio treated the same as interval
    - don’t use ordinal much
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what does it mean if out of 10 patients, every patient showed improvement but it is still not statistically significant?

A
  • p value is not low enough to indicate a stat. sig. diff.
  • just means that MAGNITUDE of change is not S.S. (if based on improvement/no improvement it would be)

* note that the state of reality cannot be changed, only the results of the null hypothesis sig. test can be changed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what should you be wary of when looking at null hypothesis significance testing (NHST)?

A
  • look at effect sizes not p values ofr results (how diff are the groups and what is our confidence that they are different)
  • pay attention to alpha and power
  • use MCID when possible (ie are the results/differences meaningful to patients/clinicians)
  • power can be ower bc of small n-size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what are the 2 important questions you should ask when interpreting the results?

A

1) what is the single value most likely to represent the truth? (effect size/summary measures)
2) what is the plausible range of values within which our true value may lie? (C.I. - how conifdent we are about summary measure)
- note we almost always find summary measures (best guess at validity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what are some common summary measures? (3)

A

1) measures of central tendancy (mean, median, mode)
2) measures of dispersion (SD, SE, variance, range)
3) statistical tests (t-test, anova, ancova, regression, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

when do we use mean vs median vs statistical tests?

A

mean = with normally distributed data

median = not normally distributed, small n size, interquartel ranges

statistical tests - normally distributed data, larger n size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the difference btw anova and ancova?

A
  • the only differnce is that ancova can adjust for certain things and anova cant, other than that just comparing btw 2 groups (ie if we want to adjust the score based on how someone is doing in the baseline)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what does a t-test compare?

A
  • it compares btw 2 groups, where they probably used the mean or standard deviation and continuous data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is a common standardized effect size?

A
  • cohen’s d
  • difference btw 2 means/SD
  • see yellow (control) vs purple (treatment)
  • small = 0.2 SD, 0.5 med, 0.8 large
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are non-standardized effect sizes?

A
  • mean (SD,SE) - for normal dist
  • and median (25 and 75th quartile) - for non-normal dist
  • t-test, anova, ancova, regression (normal dist)
  • mann-whitney (non-normal dist)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is incidence vs prevalence? - statistical tests? *DI*

A

indidence: proportin of NEW events (AKA absolute risk) - (# of new events/number exposed) - for prospective studies!
prevalance: the proportion of events (# of events/number exposed) - for retrospective studies!
- tests = chi-square, regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is a case control study?

A
  • follow people with event, don’t know incidence, see who was exposed to treatment or control
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what are summary measures/effect sizes? (6) *DI*

A
  • absolute risk reduction (ARR)
  • number needed to treat (NNT)
  • relative risk (RR)
  • relative risk reduction (RRR)
  • odds ratio (OR)
  • survival
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

what is absolute risk? *DI*

A
  • AKA risk - the event rate in the control group (baseline risk - risk in original group, incidence)
  • incidence in group does not tell us anything about comparing btw groups!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is absolute risk reduction? *DI*

A
  • aka risk differnce
  • absolute risk in control group - absolute risk in treatment group
  • no effect, ARR = 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is number needed to treat NNT? *DI*

A

number of patients one would need to treat in order to prevent one event

= 1/ARR

  • higher means not as effective!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is relative risk RR? *DI*

A
  • proportion of original risk still remaining after therapy - basically a fraction
  • ARtx/ARct
  • no indicatoin of what baseline risk was
  • less than 1 means treatment more effective, 0.5 means risk of death cut in half (still no understanding for importance bc could be half of 20 or half of 1)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what is relative risk reduction RRR? *DI*

A
  • proportion of original risk removed by therapy

= ARR/ARct = 1-RR

  • ratio instead of difference
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q
  • recreate relative risk and odds ratio chart *DI*
  • which is used for prospective vs retrospective?
A
  • RR for prospective
  • OR for retrospective
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

what is odds ratio? *DI*

A
  • odds in experimental/odds in control group
28
Q
  • when will OR approximate RR? - when is OR a poor estimate of RR? *DI*
A
  • good when incidence in control group or experimental group is low (baseline risk <10%)
  • bad when baseline risk is high (>10%)
29
Q

what is survival data?

A
  • looking at incidence of death or disease recurrance over time (ie starts at 100% survival and decreases w time)
  • interpreted similar to OR or RR
30
Q

what is a hazard ratio?

A
  • a type of survival data
  • number of events over total observation time
31
Q

what is CPHR (cox proportional hazard ratio)? *DI*

A
  • a type of survival data
  • statistical way to adjust for confounding factors for survival analysis
  • ie adjusting for imbalances btw groups in baseline prognostic factors
32
Q

describe how the CI is affected as time goes on for survival curves.

A
  • CI is smaller earlier and gets larger over time bc n-size is decreasing as people are dying
  • this is the same for comparing 2 groups (CI increases)
33
Q

what happens if we decide to test more participants on a statistically non-significant effect (p<0.05)?

A
  • more subjects = lower standard of error = smaller CI = greater chance of SS (can see effectiveness power better, meaning decrease random sampling error)
  • eventually CI does not cross the no difference line, meaning p is significant

**this works assuming the effect size stays the same**

** if we have a systematic error it won’t be fixed by adding people to the study - increasing n size does nothing to effect size!**

34
Q

describe CI wrt effect size

A
  • a CI is the interval within which the effect size is likely to be found
  • conventionally 95% CI (meaning 5% is left out so on either side of the curve p value is 0.025)
  • see notes on p 33
35
Q

what does power mean wrt increasing n-size?

A
  • there is an increasing numbe rof events w increasing n-size
  • therefore low chance of random sampling error
  • if dichotomous outcome power increases as number of events increases
  • in continuum, poer increases as sample size increases
  • notes p 33
36
Q

what does p-value tell you in terms of importance of differnce (if detected)?

A
  • nothing
  • if you have large n-size even small unimportant differnce can reach statistical significance
  • if you have very few subjects even an important differnce may not reach statistical significance
37
Q

clinically important difference? what to take into account? (3)

A
  • a priori determination of magnitude of effect that would convince you to change your practice
  • threshold for declaring something effective or non effective
  • invasiveness to patient
  • convenience to clinician
  • staff cost
38
Q

define difference btw clinically important and statistically significant? inderetminant or definititve?

A
  • SS = p value less than 0.05
  • CI = p value less than 0.05 but doesnt cross no differnce line
  • indeterminant if lower limit of CI crosses no difference line
  • definitive if it does
  • p value could be the same for both of these!
39
Q

what is equality?

A
  • new treatment is no different than control (SS)
  • relies on p-value for interpretation
  • can’t give a clinical conclusion (be wary of people drawing conclusions form this)
  • p 35 & 39
40
Q

what is superiority?

A
  • new treatment is better than old
  • relies on a definition of superiority or better than (margin defined as what is better)
  • may give CIs but no clinical interpretations boundary (no line!)
  • p 35 & 39
41
Q

What is non-inferiority?

A
  • relies on def of non-inferiority/worse than (line defined as this)
  • willing to accept some harm
  • favours Ct but doesnt cross line
  • opposite of superiority
  • 36 & 39
42
Q

what is equivalence?

A
  • relies on both superiority and non-inferiority
  • if CI falls btw 2 lines, say equal or the same as
  • defines what these CIs look like on graph basically
  • p 36 & 39
43
Q

compare sample size and power analysis

A
  • 36

SS = a priori, power known (80%), T1e known (5%), effect size known to estimate n

Power analysis = post-hoc, T1e known (5%), effect size known, n known to estimate power

44
Q

look at n-size calculation components!

  • just label them now
A
  • p 37
  • delta is the diff btw the groups
45
Q

in n-size calculation, what is delta? how to find delta?

A
  • defines the difference btw groups that the study is statistically able to detect (in denominator, so larger means smaller n-size needed)
  • this is usually a small value, could try making value large to reduce n-size but then would need to assume that there would be a large effect!
  • for delta should use MCID (within or btw group)
  • find this with lit review or using coden’s D!
46
Q

what results in a larger delta, within or between group MCID?

A
  • within group (btw group is typically 20% of within group!)
47
Q

check out sample problems on p 38

A
  • slids 2 and 3
48
Q

overall what does n-size calculation depend on (for all types of research q’s)?

A
  • the expected effect wrt no effect line
49
Q

what does the sample size equation look like for superiority/non-inferiority?

A
  • m is added in the denominator (for the superiority/non-inferiority margin)
  • p 40
50
Q

what does the sample size calculation for equivalence look like?

A
  • add m in denominator
  • see p 41
51
Q

see examples for n size calculation

A
  • p 41 and 42
52
Q

what are types of analysis that affect precision?

A
  • independent comparison btw multple groups
  • multiple independant outcomes
  • multiple independant time points
  • interim analysis
  • subgroup analysis
53
Q

how do multiple comparisons affect precision?

A
  • in terms of multiple independant comparisons (of multiple gorups), independant outcomes (not multiple outcomes needed to answer one question though!), and time points
  • increases our alpha ( more likely to produce false positive by chance!)
  • check out example on p 43
54
Q

what is subgroup analysis?

A
  • determining whether there is a significant interaction effect
  • ie a comparison of treatment outcomes for patients subdivided by baseline characteristics
55
Q

when must a subgroup analysis be done?

A
  • it must precede the analysis (a priori) or be recognised as hypothesis generating (post-hoc = hypothesis generating)
  • for a priori it is preferred (need correction through for mutliple comparisons error)
  • using subgroup decreases power
56
Q
  • how do you appropriately analyse/interpret subgroup analysis?
A
  • it is not appropriate to test effect in subroups separately then compare, must look at everyone together
  • using anova = simplest interacion test
57
Q

for subgroup analysis how do we determine whether the magnitude of the effect is large?

A
  • if n-size is small, may eb due to chance alone
  • look at CIs to see if underpowered or conclusive
  • meta-analysis might be more informative (avg estimate of effect across studies - increasing n-size)
58
Q

how do you avoid false +ves for subgroup analysis?

A
  • support by evidence/biological plausibility
  • increase n size
  • apply correction factor
  • field at the time of application (A PRIORI!)
59
Q

what is interim analysis? when is it done? (3)

A
  • analysis conducted before completion of trial
    1) efficacy (treatments are convincingly diff) - most common
    2) futility (treatments convincingly similar)
    3) harm (unacceptable side effects)
60
Q

look at data safety and monitoring board slide

A

p 45

61
Q
  • what are some red flags involved w interim analysis? - what does interim analysis do to type 1 error?
A
  • failint to plan ahead
  • failing to report that study was stopped early
  • selecting unsatisfactory criteria for stopping (must involve few looks/stringent p-values)
  • p 46
  • increases prob of t1 error bc lets say we have 2 interim analyses, thats 2 diff time frames for data analysis (also include 2 ind outcomes and 2 subgroup analysis), thats already 8 comparisons!! - see chart
62
Q

look at example on p 46

A
  • slide 3
63
Q

how do you apply a correction factor for number of comparisons made?

A
  • bonferroni correction (most common)
  • 0.05/number of comparisons
  • this is conservative
  • basically saying want type 1 error to be 0.05 then i need p value to be less than 0.02 now when i do stat testing
64
Q

what are spending functions?

A
  • for stopping rules in interim analysis
  • uside down triangle method = very hard to stop trials this way
  • make 4 comparisons (look at data 4 times), p must be very small to stop for the first 3, 0.05 at 4th
65
Q

look at exmaples for stopping interim

A
  • p 47
66
Q
  • what is p-value dependant on?
A
  • n size and number of events? double check this!