Statistics Flashcards
What is probability?
the number with outcome/the total number (scale 0 to 1)
Probability (p) = #favourable outcomes/ #all poss outcomes
= #cases/ total pop
What is Rate of change of probability over time?
Rate of change of probability over time:
• Instantaneous rate = hazard rate
o Instantaneous probability of disease within next small interval of time, given that subject is still at risk of time t
• Average rate = incidence rate
o Expresses risk of disease development per unit in time relative to size of pop at risk
What is conditional probability?
= p(A|B) = (P(A and B))/(P(A))
= probability event A will happen given that some other event B has already occurred
p(A|B) is not the same as p(B|A)
What is sensitivity and what is the equation?
Sensitivity = testing +ve given that you have the disease
= Total Positive/ (Total Positive + False Negatives)
What is specificity and what is the equation?
Specificity = testing -ve given that you don’t have the disease
= Total Negative / (False Positive + Total Negative)
What is the positive predictive value and what is the equation?
+ve predictive value = TP/ TP + FP
What is the negative predictive value and what is the equation?
-ve predictive value = TN/ FN + TN
What is a percentage?
(proportion) *100 (scale 0 to 100)
What is a rate?
the number of times something happens per a quantifier (x per 100 people) (scale 0 to infinity)
What are odds?
the number with the outcome/the number without the outcome (scale 0 to infinity)
How do you work out a risk ratio? Also what is Number needed to?
Divide one probability/percentage by the other
Whichever group on top is the focus
When we divide two numbers together, there are 3 potential outcomes:
Risk in group A is lager than in group B (RR=>1)
Risk in group A is same as group B (RR=1
Risk in group A is smaller than in group B (RR=<1)
We interpret RR in relation to 1 as we are trying to show difference
RR = 1 – no association, no effect
RR > 1 - +ve association, risk factor
RR < 1 - -ve association, protective factor
Indicates potential benefit/ risk of a clinical intervention
NNT = 1/ RD ‘number of people needed to treat to prevent 1 outcome’ NNH = 1/RD ‘number of people needed to cause harm’
If RD is positive, intervention is harmful rather than protective – use NNH
How do you work out an odds ratio?
Divide the odds in one group by the other
Whichever group on top is focus
Odds in group A are lager than in group B (RR=>1)
Odds in group A are same as group B (RR=1
Odds in group A are smaller than in group B (RR=<1)
What is prevelance probability, what are its problems and why is it useful?
Prevalence probability = probability of having disease at given time point (right now)
Problems:
• Length-time bias: conditions with longer duration more likely to be captured
• Difficult to make meaningful comparisons of risks
Usefulness:
• Good for measuring disease burden on a pop to assess healthcare needs + allocate resources
• In etiologic research where outcomes do not have follow up data/ onset difficult to define
What is incidence proportion and what are the problems?
Incidence probability = probability of getting disease during specified time period (new cases in next 5 yrs)
Cohort (follow up) study
Problems:
Assumes fixed + complete follow up (competing risks – deaths to other causes, or losses to follow up)
Ignores time to event
Use incidence rates instead: #new cases during time period/ Σperson-time
Σperson-time = (size of pop at start) x (average duration of follow-up)
What are descriptive objectives?
Measure frequency of disease in a population
Estimate disease burden (mortality, financial cost etc)
Measure response
Estimate frequency of known/ suspected risk factor
What are analytic objectives?
Specify risk factors + causes
Quantify the effect of exposures or treatments
Assess the relative effectiveness of interventions
‘how + why’
What is an experimental study?
recording of outcome following planned intervention in the care of patients
• Always analytical
What is an observational study?
recording of outcome without intervening in the care of the patient in any way, other than what is routine clinical care
• Can be analytical if have comparison group or descriptive if not
What is a confounding variable?
distortion that should be prevented or controlled
What is effect modification?
interaction, useful information (ie new drug slow in elderly so ill for longer, age = effect modifier)
What is the difference between association and causation?
Association refers to statistical link between exposure + disease
• May not reflect cause-and-effect relationship
What is a cohort study? What are the advantages and disadvantages?
- Prospective (exposure outcome)
- Take sample of population at risk of outcome, some of sample exposed then follow up and look at incidence of outcome
- Expensive + time consuming, not suitable for rare disease (sample too small) or for diseases with long latent periods
What is a cross-sectional study? What are the advantages and disadvantages?
- No direction of inquiry ‘what is happening’ at current moment
- Take sample of population, look at exposure + cases together
- Not suitable for rare disease/ outcomes, very limited potential to establish disease aetiology due to confounding, selection bias (cases with diseases with long duration + more survivors more easily included)
What is a case-control study? What are the advantages and disadvantages?
- Retrospective (outcome exposure)
- Disease based sampling: 2 sample groups (cases with disease + controls without), backward follow up to look at exposure
- Only single outcome/ disease can be studied, no valid estimates of risk of disease, limited potential to establish causal effects due to confounding, selection bias + measurement bias
What is a randomised controlled trial? What are the advantages and disadvantages?
- Prospective
- Sample at risk population, give half treatment, other half = control, after follow up period look at incidence of outcome
- Treatment group allocation must be random to prevent confounding problems
- Ideally would have a double-blind trial – neither patient nor physician aware of treatment – prevents bias (differential follow-up or care otherwise)
- Most convincing evidence of cause-and-effect relationships but expensive, complicated by non-compliance, not always ethical
What is a 2x2 contigency table?
For any type of analytical study, data can be summarised in a 2X2 table with exposed and not exposed on the left and diseased and not diseased on the top.
RD = a/(a+c)- b/(b+d)
RR = (a × (b+d))/(b × (a+c))
OR = (a × d)/(b × c)
For case-control studies, only OR is meaningful
What is non-probability sampling?
Non-probability samples:
• Convenience (ease of access), snowball (friend of friend), purposive (you choose who)
• Probability of being chosen is unknown
• Cheaper + easier to implement but unable to generalise, high potential for bias
What is single random sampling?
- Each member of population has equal probability of being selected
- Use random mechanism (random number table)
- Need complete list of population – time-consuming, does not always achieve best representation - by bad luck sample not evenly spread across all sections of population
What is systematic random sampling?
- Members of population selected at equal intervals
* Need complete list of population, less precise than SRS but easy to carry out
What is stratified random sampling?
- Population partitioned into strata and sample selected from within each stratum (by SRS) – should be low within-strata variability
- Requires sampling frame, prior info about population being sampled, costly but increases representativeness and more statistically precise
What is cluster sampling?
- Population partitioned into clusters and sample selected (by SRS) – all people in selected clusters are included in sample, should be low between-groups variability
- Don’t need complete list (just list of clusters), cheap but decreases statistical precision
What is confidence?
Inferential statistics uses sample data to draw inferences about population represented
2 approaches: confidence intervals (estimate range of plausible values for population parameter) + tests of significance (assess degree of uncertainty in evidence using a p-value)
What is variability?
Chance evidence is due to variability
• Different samples produce different estimates of effects
• Sampling error
2 types of variability:
• Within sample: measured using standard deviation (learn equation)
• Between samples: measured using standard error (learn equation)
What is normal distribution?
‘Normal reference range’ in clinical practice: ± 2 SDs
Normal distribution not normally encountered – few biological variables conform, most medically important variables are not symmetrical
What is the central limit theorum?
Had we repeatedly sampled a population + calculated the sample means, the sampling distribution of those means would be approx normal
What is standard error?
SE is low when sample is large + variability in data is low
Lower SE leads to lower uncertainty of mean + better precision of mean
What is a confidence interval?
More confidence –> wider interval –> less precision
95% CI: for every 100 studies performed, we expect 5 to produce an interval that does not contain the true population value, 95% of the time the CI would include the true population parameter
CI for difference: if interval excludes zero, effect is statistically significant
CI for ratio: if interval excludes 1, effect is statistically significant
What is a null hypothesis?
statement being tested – statement of no difference, no association, no effect
What is a p-value?
P-value is probability of an effect as large/ larger than observed effect in sample, assuming null hypothesis is true
• Small p-value: sample results unlikely when null hypothesis is true – data contradicts, reject null hypothesis – statistically significant effect
o <0.05 is often accepted as statistically significant
• Large p-value: sample results likely when null hypothesis is true – cannot reject null hypothesis – inconclusive
Statistical significance does not necessarily mean that effect is clinically significant – needs to be big enough to make worthwhile difference
What is regression analysis?
Multivariable regression analysis = valuable tool for diagnostic, prognostic + aetiognositic research problems:
• Diagnosis: if disease is present
• Prognosis: future course of patient’s current standing + how this depends on choice of intervention
• Aetiology: factors that cause disease
Applications of Regression:
• Developing model for risk prediction of a clinical outcome
• Isolating effect of single variable on clinical outcome
• Identifying multiple important predictors of a clinical outcome and how they jointly affect outcome
• Covariate adjustment to improve efficiency in RCTs
Crude (unadjusted) does not take into account effect of confounding variables
Adjusted accounts for confounding variable – generated using multivariate regression analysis
• May still be residual confounding
Regression relates 2 kinds of variables: • Outcome (or response) variable o Eg BP, 90 day mortality • Explanatory variables (or predictors) o Eg age, comorbid conditions
Describe 4 types of regression models?
Model: Linear Regression
Outcome: Continuous
What is modelled?: Mean
Measure of effect: Mean Difference
Model: Logistic Regression
Outcome: Binary
What is modelled? : Log (odds)
Measure of effect: Odds Ratio
Model: Poisson Regression
Outcome: Binary (count data)
What is modelled?: Log (incidence rate)
Measure of effect: Incidence rate ratio (IRR)
Model: Cox Regression
Outcome: Time to event
What is modelled?: Log ( Hazard Rate)
Measure of effect: Hazard ratio (HR)
What is simple linear regression?
Simple as only one ‘X’ variable (how does outcome Y change with X)
Look for correlation – linear relationship
/ = positive, \ = negative, — = no relationship
Mean y = a + b x
Mean y = outcome variable
A = intercept
B = slope (∆y/∆x)
X = predictor variable
What is multi-variable/multiple linear regression?
Mean y = a + b1X1 + b2X2 + b3X3 ….
P-values included in table to show which predictors have no statistically significant effect on birth weight
• P > 0.05 = not significant
Standard errors can be used to calculate confidence interval for the b coefficients:
• b ± 1.96 SE
• b coefficients quantify effect of each predictor on__________, adjusting for all other predictors
o eg study looking at mean birth weight values: b = 9 for mother’s height: increase of 1 cm in mother’s height is expected to produce an average increase in birth weight of 9g
o -ve coefficient means a decrease
What is chance?
= random error (imprecision) that produces different observations for replicate experiments or repeated samples
Controlling:
• Prevention: sufficiently large sample size
• Detection + evaluation: confidence intervals, p-values
What is bias?
= any systematic error in the design/ conduct of research resulting in an estimated effect which is different from the truth, 2 types:
• Selection bias: participants (inc factors affecting recruitment/ retention of subjects)
• Information bias: information get from participants (eg due to improperly calibrated measuring device, recall or interviewer bias)
Controlling:
• Before study: appropriate selection of study participants + correct data measurements + definitions
• After study: assess how well this was done
Good precision: SE small, all results close together
Good accuracy: bias low, all results in expected range
What is confounding?
= distortion in measure of effect as other variables (that will have an effect) are not controlled for
Confounding pulls observed (crude) effect away from the true effect – can over- or under-estimate the truth
Controlling
• Before study: randomise (comparable groups), restrict (eliminates confounders – ie only use one gender in study) + matching (control group that resembles case group with respect to confounders)
• After study: stratified analysis, regression modelling, adjusted effect measures
Criteria for confounding:
• Confounder must be associated with the outcome
• Confounder must be associated with the exposure
• Confounder cannot be an intervening variable between exposure + outcome
Effect modifier has different effects across strata (statistically significant differences) – belongs to nature so is useful
What is the criteria for causal associations?
Strength of association
• How far from null value measure of effect is
Temporal sequence
• Did exposure precede outcome?
Consistency of effect
• Has the effect been seen by others? Is study reproducible in different settings?
Dose-response relation
• Does increased exposure result in greater effect?
Biological plausibility
• Does association make biological sense?
Experimental evidence
• Has an RCT been done?