Statistics Flashcards
What is bimodal distribution?
2 peaks in data (two modes)
What is standard deviation?
Shows the spread of data around the mean
+/- 1SD 68.2%
+/- 2SD 95.4%
+/- 3SD 99.7%
What does a large standard deviation mean?
Greater spread of data away from the mean
What are confidence intervals?
Ranges within which a true value lies
ie we only have mean of samples, we are guessing the true mean of the population
If the CI of two groups do not overlap= significiant
What does 95% CI mean
We are 95% sure the true mean lies within that range.
If crosses 0, >5% chance nil impact of intervention
What will a larger study do to CI
narrow it
What does it mean if CI includes 1?
Intervention makes no difference
What do they key components of a forest plot mean?
diamond= combined estimate of all studies, sat sig if does not cross 0
greatest impact= most positive/negative
left= intervention is better, right = intervention is worse.
Line of no effect- if crosses this, no evidence intervention works
size of square= size of sample
line about square= confidence interval
What is the null hypothesis?
Intervention has no impact on outcome, any difference found is due to chance
What is a p value?
Probability that any difference noticed between intervention is due to chance
What is a significant p value?
0.05 = 1 in 20 that observed change is due to chance. Treatment probably did cause outcome.
0.01- highly significant
0.001- very highly significant
What are parametric tests used for?
Normally distributed data
Which parametric test can be used for >2 samples?
ANOVA, to see if means come from same population
Which parametric test is used for 2 samples
T/Student’s T, test that the samples come from a population with the same mean
Which parametric test is used for 1 sample
chi squared- compares improvement with two treatments, gives p value
What is paired data?
Data from the same population ie the same people before and after treatment
What are non parametric tests used for?
Not normal data, may sometimes be used to transform data into normal distribution
What is the Mann Whitney U test used for?
Non parametric, compare means between 2 groups and give p value to see if significant
Which of these are parametric?
a) Kruskal Wallis
b) Friedman
c) Wilcoxon Signed Rank
d) ANOVA
d) ANOVA- all others are non parametric tests
What is risk?
Probability an event will happen 1 in 100 are sick, 1/100= 0.01
What is risk ratio?
risk in treated versus untreated group
>1= higher risk if exposed
<1= lower risk if exposed
if CI includes 1= not stat sig
What is odds?
Number of times event happens / number of times event does not happen
used in case control studies
ie 1 in 100 are sick. 1/99= 0.0101
What is odds ratio?
Odds of exposure in case v control
1= no difference
>1= increased if exposed
<1= decreased if exposed
eg OR = 2.64= 2.62x more likely to have disease
What is ARR?
difference in event rate in intervention v control
100/NNT
80% improve in intervention, 60% improve in control. 80-60= 20%
What is NNT
100/ARR- how many who need to be treated for one person to benefit
What is RRR
Proportion by which intervention reduces event rate
40% in placebo and 20% in control
=50% RRR
What is NNH?
100 / (% with nausea in intervention - % nausea in control)
eg 100 / (6-1) = 100/5 = 20
1 in 20 will get nausea
What is correlation?
the strength of linear relationship between two variables
What is the correlation coefficient? (r)
Strength of the linear relationship between two variable
r= 1 (positive- directly related, as one increases so does other)
r=-1 (negative- inversely related, as one increases the other decreases)
r=0 (no line, random points)
Parametric- Pearson’s
Non-parametric- Kendalls/Spearmans
How do we interpret r as a value?
Correlation coefficient
0-0.2= meaningless
0.4-0.6= reasonable
0.6-0.8= high
0.8-1= suspiciously high
What tests can we use to assess correlation coefficient?
Pearsons= normal
Spearmans= not normal
What is r squared?
How much variation in one value is affected by the other
closer to 1= higher correlation
What is regression?
How one set of data causes another eg blood glucose and Hba1c
We can use one to predict the other using a graph
slope of line= regression coefficient
univariate- 1 dependent (influenced by something) and 1 independent
multivariate- one dependent and 2 or more independent
What is regression constant?
Where line crosses vertical axis
What is regression equation?
y= a (constant) + b (coefficient) x
When do we use logistic regression?
To look at outcome in 1 of 2 groups (has disease/has not)
When do we use poisson regression?
study times between events/waiting times
when do we use cox regression?
Survival analysis- time until a certain event eg death/discharge
What is Kaplan Meier?
Calculates new survival rate after each event
What is log rank test?
Compares survival between groups
What is cox regression?
Explore relationship between event and variable eg death and smoking/BMI
1= same (exposure and control)
2= (double risk if exposure)
What is sensitivity?
a/(a+c)
pick up rate of a test
What is specificity?
d/(d+b)
how likely a person without disease tests negative
What is PPV?
a/ (a+b)
likelihood someone who tests positive has disease
What is NPV?
d/ (d+c)
Likelihood someone who tests negative does not have disease
What is likelihood ratio?
likelihood test result would be expected in someone with v someone without disease
sensitivity / (1-specificity)
LR =2, if test is +ve this person is twice and likely to have disease than not have it
What is Kappa?
How accurately a test can be repeated (ordinal data eg CIN1,2,3)
0= due to chance
0.5= good
0.7= very good
1= perfect
ie checking the same sample in two different labs
What is Bonneferri?
Multiple testing adjustment
More tests gives an increased chance of error
p=0.05= 5% chance error
What are tailed tests?
2 tailed= reject null hypothesis, test is better or test is worse
1 tailed= reject null hypothesis= test is only better
What is incidence?
Number of new cases over time
eg 15/1000 x 100 = 15%
What is Prevalence?
Existing cases as a point in time
Power
probability that it can detect a statistically significant difference
eg if expect 100% cure rate, does not need so many people
if expect 1% cure rate, needs a lot more people
Probability type 2 error will not be made (>0.8=adequate)
-80% likely to find a significant difference
-increases with sample size
What is Type 1 error?
REJECT TRUE null hypothesis
false positive
reduced by bonneferri correction
What is Type 2 error?
ACCEPT FALSE null hypothesis
false negative ie if sample too small or variance too big
What is service evaluation?
designed to define/judge current care
what standard does this service achieve?
What is Quality?
patient experience
clinical effectiveness
patient safety
What is quality framework?
1) Clarify quality
2) Measure and publish results
3) Reward
4) Leadership
5) Innovate
6) Safety
Plan Do Study Act
introduce and test potential quality improvements
refine prior to wholesale implementation
Model for Improvement
decide on measurable QIs and test/refine prior to implementation
Performance benchmarking
drive quality improvement by increasing awareness of local/national targets
find and share best practice eg KPIs
Healthcare failure modes and effect analysis
Identify how a process may fail and assess impact of this
Process mapping
map patient journey to identify QI opportunities
Statistical process control
measure and control process quality against predefine parameters
ensure operating at full potential
Root Cause Analysis
identify causes after an event has occurred
physical, human or latent
fishbone cause and effect model
What is Evidence Based Medicine?
Making a clinical decision based on:
-research
-clinical expertise
-patient preference
What is internal validity?
To what extent does study measure what it set out to
(how good do methods answer research question)
What is external validity?
What extent can results be generalised to wider population / real life setting
What is efficacy?
Impact under trial conditions
What is effectiveness?
Impact under ordinary setting
What is PICO?
Patient/problem
Intervention
Comparison
Outcome
What is Journal Impact factor?
frequency of citations
What is a confounder?
triangular relationship with exposure and outcome
associated with, but not consequence of, exposure and outcome
eg city living, stress and heart disease
+ve = confounder shows an association when there isn’t one
-ve+ confounder masks association when there is one
What is Selection Bias?
issues in recruitment or allocation
What is performance bias?
influenced by researcher or participant
detection- reduce by blinding
attrition- selective drop out
reporting-
What is an observational descriptive study?
looking at what is observed in a population
What is an observational analytic study?
looking at similarities and differences between groups
What is an experimental study?
intervene in some way and compare outcome to control
What is a longitudinal study?
more than one point in time
assess something over days/months/years
What is a cross sectional study?
single point in time
What is a parallel study?
Looking at two interventions at the same time
What is a prospective study?
Present and future
collect data as you go along
What is a retrospective study?
present and past
collect data that already exists
What is an ecological study?
Population/community level data (not individuals)
What is an explanatory study?
takes place in an ideal setting with homogenous subjects
What is a pragmatic study?
takes place in real life eg ward/clinic
more effectiveness/real life
difficult to blind and limit drop outs
Cohort
Observational study
Group exposed to risk factor v not exposed
prospective
attrition bias
retrospective- use existing study but add on another outcome
inception- recruited early on in disease process before outcome established
Case- control
Retrospective
Look at those who have outcome and do not and ask about exposure
quick and cheap
recall bias
Nested- take population from cohort study and ask about previous exposures
case cohort- control group is from initial at risk population
Austin Bradford Hill
9 considerations of association versus causation
-strength (strong/large)
-consistency (replicated in other studies)
-specificity (specific disease)
-temporality (exposure precedes disease)
-biological gradient (more exposure = higher risk)
-plausibility- can we explain causation with science
-coherence (consistent with natural history)
-experimental evidence (other studies)
-analogy (have we seen similar relationships)
Rothman and Greenland
Sufficient cause
-minimal conditions and events that inevitably cause disease
Component cause
-acts with others to cause disease (eg genetic/environmental factors)
Rothman’s pie
Cross-sectional
prevalence of exposure and outcome at single point in time
establish associations, not cause and effects
Uncontrolled
All participants get the same treatment
Controlled
two treatments and compare outcome
RCT
random allocation into groups
reduces selection bias
equally distribute confounders
measure efficacy
allows for meta analysis
Crossover trial
receive one treatment then switch to another treatment, check which made better outcome
eg if lack of subjects
n of 1 trial
single subject
repeated experimental analysis
minimal generalisability
Factorial study
assess impact of >1 intervention
eg group
intervention a- then intervention c or d
intervention b- then intevevention c or d
Phase 0
human microdosing
give small doses and assess bioavailability/half life
Phase 1
small group of healthy people
dosage range/ side effects
Phase 2
People with that illness
look at effectiveness/safety profile
Phase 3
large groups of people
effectiveness, dose range, duration, side effects, new treatment v previous treatments
Good results= marketing authorisation
Phase 4
post-marketing surveillance
benefits and side effects in different populations
new safety concerns
Random sampling
all have equal chance
systematic/quasi random sampling
every nth person
stratified sampling
based on characteristics eg ethnicity
cluster sampling
population put into similar representative clusters, some clusters used
convenience sampling
whoever appears
snowball sampling
one patient tells their friends
Bias of sampling
admission rate- only those who attend healthcare are picked up
diagnostic purity- comorbidities excluded
membership bias- those in a club/group
historical control- subjects chosen over time as definitions change
Response bias
Those who volunteer to take part may not reflect popultion
Matching
Demographic
age/gender/ethnicity
Lifestyle
smoking
Disease
comorbidities
Treatment factors
distributing confounders between groups
Randomisation
simply- by subject
block- each block given a group of same numbers
stratified- like block but distributing characteristics
Minimisation
random allocation
impacted by those already allocated to keep groups similar
Blinding
reduced observation bias
single- researcher/participant
double- both
triple- both + analyst
Nocebo effect
Negative effects of a dummy pill
Concealed allocation
reduces selection bias
Ascertainment bias
researcher not blinded so changes the way questions are asked
asc= ask
Response bias
participant not blinded so responds differently
Hawthorne effect
Participant changes behaviour as aware they’re in a study
Recall bias
Selective remembering
eg case-control
What are endpoints?
Clinical- mortality/morbidity/survival
Surrogate-
predict a clinical benefit eg LDL
Composite- many clinical events, if one of these occurs
Secondary- other characteristics measure to help describe treatment
Validity
Face- does test measure what it’s supposed to
Content- test measure variables that it should eg exercise ability as a surrogate for CVD
Criterion
concurrent- current test measure in the same way a good test would
predictive- current test predicts what it is supposed to
variable
an entity that can take on value eg gender
attribute
eg male/female
parameter
numeric quantity that characterises population eg mean or standard deviation
Accuracy
how close measurement is to the true value
Precision
how close repeats of the test are
Incidence
New cases over a time period
Mortality
Rate= deaths in time period/population size
ratio = rates of study v general population (lower = better)
Morbidity Rate
number of new cases/size of at risk population
Point prevalence
proportion of population with disease at a point in time
number with disease/number in population
Period prevalence
point prevalence (number with disease/number in population) in a set time period
Types of Data
Nominal- colours
ordinal- mild, moderate, severe
interval- temp (no true 0)
ratio- scale with a true 0
Probability distribution
likelihood of value of a random variable
ie heads/tails =0.5
2 heads= 0.25
discrete- only whole numbers like above
continuous= any numbers
Binomial distribution
two possible outcomes in a fixed number of runs, each run is independent
toss a coin 5 times
Bernoullis= only one turn
Poisson distribution
repeat runs of a random variable with two outcomes, not fixed number of turns
eg if 5 births/day on average
what is the likelihood here will be 6 tomorrow
Normal distribution
symmetrical around mean
Modal
unimodal= one peak in data
bimodal= multiple peaks
Variance
dispersion around the mean
standard deviation
degree of data spread around mean /precision
square root of variance
large sd= larger spread
Effect size
mean of experiment - mean of control
/by sd
larger= greater impact
Coefficient of variation
compare spread of data between two studies using different values
Coefficient of skewness
symmetry of data
+ve- skewed to tail extends to R
-ve- tail extends to left
0= symmetrical
Coefficient of kurtosis
peakedness of data
Standard error of mean
sd of the sample means
95% +/- 1.96 SE
Confidence Interval
range in which we are 95% sure population result lies based on sample result
shown by error bars
Per protocol analysis
Only include those with full compliance to trial protocol
explanatory approach
Intention to Treat analysis
Include all subjects, whether they complied
Reflects real life
pragmatic approach
Imputation
substitute missing data so data can be analysed
Control event rate
C / C+D
Experimental event rate
A / A+B
ARR
CER - EER
-ve = increase
eg 0.8 in control, 0.4 in intervention. 40% less likely to get disease if given rx
Relative risk
EER / CER
ratio of risk of outcome
=1 = same
>1 - increased risk if exposed
<1 = reduced risk
2= double risk
RRR
CER- EER / CER
NNT
1/ARR
lower = better
number of people you need to treat for one good outcome
Odds Ratio
(a/b) / (c/d)
how likely outcomes are between the groups
1= no effect
>1= more likely if exposred
<1 = less likely if exposed
NNH
Number of people needed to be exposed for one bad outcome
Risk benefit ratio
NNH (round down to whole number) : NNT (round up to whole number)
Null hypothesis
Assume any difference is due to chance
ie no relationship between exposure and outcome, any difference between groups is due to chance
if alpha = 0.05, nul hypothesis is true, results occur 5% of timw
P value
probability observed results are due to chance
lower = less likely
<0.05 = stat sig
Tailed tests
1= 1 direction of interest (greater than or less than-only looking at one way)
2= 2 directions of interest (greater than and less than- accept may be either way)
1 sample- categorical
test
chi squared
fisher’s exact if small
1 sample- non-normal test
Sign
Wilcoxon Sign Ranked
1 sample- normal test
Student’s T
2 samples- unpaired
Chi squared
FIsher’s exact (small)
2 samples- paired
McNemar’s
2 sample- non normal and unpaired
Mann Whitney U
2 sample- non normal and paired
Wilcoxon
2 samples- normal
Student’s T
> 2 samples, categorical
unpaired- chi squared
paired- Mcnemar’s
> 2 samples non normal
unpaired- ANOVA/Kruskal-Wallis
paired- Freidman’s
> 2 samples normal
ANOVA one way (unpaired)
ANOVA repeated measures (paired)
Categorical data test
paired- McNemar’s
unpaired + large- chi squared
unpaired + small- Fisher’s exact
What do parametric/non parametric tests do?
Non-parametric
compare medians
Parametric
compare means
What does paired data mean?
Same individuals at different time points
unpaired= different subjects
Fragility Index
number of people to have a different outcome for trial to be non significant
smaller- more fragile
Equivalence study
show equivalence between two drugs
- new rx as effective as established one
Non-inferiority study
new drug is no worse than established drug
Class effect
similar outcomes, therapeutic and adverse effects of two or more drugs
Serial testing
If one test is +ve we do another to confirm ie HIV/syphilis
Parallel testing
many tests run at the same time to increase sensitivity
What is a consort checklist?
A checklist used to increase quality of RCT reports
Hazard ratio
1= equal hazard rate
>1 = experiment has higher hazard rate
<1 experiment has lower hazard rate
use Cox regression
What is grounded theory?
qualitative study
do not start with a theory, theory is developed from data collection
What is a phenomenological study?
qualitative, looking into the meaning of a lived experience
What is an ethnographic study?
learning from a group to interpret something
What is a historical study?
anticipate future events by learning from the past
purposive sampling
select those with knowledge
quotive sampling
select those with characteristics
homogeneity
studies have similar results
ie 0% heterogeneity
heterogeneity
25% low
50% moderate
75% high
variation in results between studies
fixed effects model= no heterogeneity
test with- forest plot/cochran’s/ I2
How to test for publication bias
funnel plot/galbraith’s
Hierarchy o evidence
1) Metanalysis/Systematic Review/RCT
2) Systematic review of case control/cohort
3) Case control/cohort
4) Case report/series
5) Expert opinion