Epidemiology and Statistics Flashcards
What are the major epidemiological areas?
Descriptive, aetiological, evaluative, health services, clinical
What are the epidemiological measures of frequency?
Prevalence: number of cases/population at risk
Incidence: number of new cases/population at risk over time
What are the epidemiological measures of association?
Relative risk, rate ratio, odds ratio
What are the epidemiological measures of impact?
Attributable risk, vaccine efficacy and effectiveness
What is the epidemiological measures sequence?
Measures of disease frequency –> measures of association (ratio and difference) –> measures of impact (AR, AR%, PAR, PAR%)
What are the different epidemiological study designs?
- Experimental: randomised controlled trials or non-randomised controlled trials, manipulation, control, randomisation, blinding
- Observational: descriptive study (no comparison group), analytical study: cohort study (exposure –> outcome), case-control study (outcome –> exposure), cross-sectional study (exposure and outcome at the same time)
What are the epidemiological studies errors?
- Selection bias: self-selection, nonresponse, attrition, selective survival
- Information bias: reporting bias, false positives/negatives, errors and omissions in medical records
- Confounding: difference in age, gender, health status
What are the epidemiological data sources?
- Aggregate data: vital statistics, census, disease registries, monitoring systems
- Individual level data: vital events, disease registries, medical records, national surveys, questionnaire data
What is intention-to-treat analysis?
The primary analysis is a direct comparison of the treatment groups and this is performed with subjects being included in the group to which they were originally allocated
What is per-protocol analysis?
Patients are analysed according to the treatment they actually received
What are the limitations of case-control studies?
Choice of control group affects comparison, data reported by subjects or from records - usually retrospective, so may be incomplete, inaccurate or biased
What are the limitations of cohort studies?
Need big numbers, often need long follow-up, need to keep in touch with participants, may be expensive
How is continuous data summarised?
Measures of the centre of data: mean, median
Measures of variability: standard deviation, range (min and max), interquartile range
How do you calculate standard deviation?
square root of variance; variance = (sum of squared differences between mean and each value)/(n-1)
Which summary measure would you use for continuous data with skewed distribution?
Centre of distribution: consider median
Spread of data: consider interquartile range
What are histograms?
rectangles (bins) have heights or areas which are proportional to the frequencies in each category, y scale is frequency per interval
What is a box and whisker plot?
Contains: median (horizontal line in the box), upper and lower quartile, maximum (top of whisker), minimum (bottom of whisker)
What are positive and negative skews?
Positive skew: tail on the right in longer (more common)
Negative skew: opposite (gestational age, birthweight)
Which graphical methods are used to display categorical data?
Bar charts: each category is given its own bar along the horizontal axis (there are spaces), height of bar is proportional to the frequency or percentage of observations
Pie charts
Why is it important to summarise data?
To monitor data quality, to check for invalid or missing data entries, to describe characteristics of participants in a study, before doing a complex analysis
How do you interpret normal distribution curves?
95% lies in +/- 2SD; 68% in +/-1 SD
What is considered a large sample?
For means: for a sample mean, a sample size of 100 is considered large –> sample mean follows normal distribution, smaller than this –> data needs to follow normal distribution, t distribution is used to calculate CI
For proportions: considered large if r and n-r are both greater than five, if not –> binomial CI is calculated
Which sample mean gives the most precise estimate of population mean?
- Bigger sample size
2. Smaller spread of data (SD), estimate closer to true mean
How is standard error defined?
Indication of the extent of the sampling error; how much a sample mean tends to vary from the true population mean; it provides an estimate of the precision of the sample mean
How is SE(mean) calculated?
SE(mean) = SD/square root of n
What are the assumptions for calculating CI of a population mean?
Normal data or large sample, sample is chosen at random from population, observations are independent of each other, the sample is not small (at least 60)
What are the assumptions for calculating CI of a population proportion?
The sample is chosen at random from the population, the observations are independent of each other, the proportion with the characteristics is not close to 0 or 1, np and n(1-p) are each greater than 5
What are Type I and Type II errors?
Type I error: getting a significant result in a sample when null hypothesis is true (false significant result), probability is 5%
Type II error: non-significant result in a sample when null hypothesis is false in population (false non-significant), probability should not be more than 20%
What is a P value?
The probability, given the null hypothesis is true, of obtaining data as extreme or more extreme; commonly, P<0.05 is statistically significant
What are the factors that influence the size of the P value?
- The size of the real effect in the population sampled
- The sample size
- The variability of the measure involved
Should we use 2-sided or 1-sided tests and why?
2-sided, 1-sided do not distinguish between no effect and harmful effect
What are the assumptions of t-tests and what happens if the assumptions do not hold?
- Continuous data; normally distributed
- Variances (SDs are the same)
If do not hold: transform data, t-test is less robust if variances are different as opposed to slight skewness
When are the requirements for the data to be normal in t-tests less critical?
> 50 per group for 2-sample test, >100 for the paired test
What is t in the t-test?
t = mean difference/SE(mean difference)
What are the assumptions of the chi-squared test?
Large sample: at least 80% of expected frequencies must be greater than 5, if 2*2 all frequencies should be >5
What is the theory of demographic transition?
Beginning: death rate and birth rate are high, total population is low
Death rate decreases, birth rate stays same, population increases rapidly
Death rate keeps decreasing until plateau, birth rate decreases until plateau, population plateaus
What is population census?
Every 10 years, questionnaire to households, complete coverage of population; methods: enumeration districts (students listed at term time address); topics: number of persons, age and sex, household relationships, accommodation, health care, educational qualifications, cultural characteristics, employment, work place and journey to work
Which are the hard to reach groups?
Disabled or elderly, ethnic minorities, faith communities, migrants, non-english speakers, unemployed, people with low income, students and other young adults, gypsies
What are the new questions in 2011 census?
Main language and english language proficiency, month/year entry in the UK, intended length of stay, passport held, national identity
What are the uses of census data?
define national and local populations, population estimates, population projections, demographic and social indicators (age, ethnicity, disability and deprivation)
How are population estimates calculated?
Produced annually, cohort component method: take previous mid-year resident population, add age by one year, add births during the year, remove deaths, allow for migration in and out; general practice populations: problems of list inflation
How are population projections calculated?
Every two years, assumptions: base population, fertility, migration and mortality
How are indicators of social deprivation obtained?
Area-based measures, summarise household characteristics, Townsed and Jarman scores (old), census derived information, summary measures of affluence or social deprivation
What are the Bradford-Hill criteria?
Minimal conditions necessary to provide adequate evidence of a causal relationship between an incidence and an outcome; strength of association, consistency, specificity, temporality, dose response relationship, biological plausibility, coherence, experimental evidence, analogy
What are the advantages and disadvantages of case-control studies?
Advantages: inexpensive and quick, good for rare outcomes and multiple risk factors, can look at risk factors in detail
Disadvantages: not good for rare exposures, selection bias, provide no estimates of disease risk
What are odds ratio and when are they used?
Case-control study; estimates ratio of disease odds in exposed group to disease odds in unexposed group in the study population
How do admission, diagnostic, survival, non-response, recall and interviewer bias influence OR?
admission: increased a –> overestimation of OR
diagnostic: increased a –> overestimation of OR
survival: decreased a –> underestimation of OR
non-response: decreased d –> underestimation of OR
recall: increased a –> overestimation of OR
interviewer: increased a –> overestimation of OR
What are misclassification errors?
Non-differential: random error, not a bias, weakens measure of association
Differential: systematic error, related to exposure or outcome status, bias, measure of association distorted in any direction
How are prevalence ratio and odds ratio interpreted?
Prevalence ratio: IV drug users are X times more likely to be infected with HIV than non-IV drug users
Prevalence odds ratio: odds that an HIV+ person uses IV drugs in X times the odds that a HIV- person uses IV drugs
What are the advantages and disadvantages of cross-sectional studies?
Advantages: quick, easy to conduct, measure prevalence for all factors, multiple outcomes and exposures, generating hypotheses
Disadvantages: difficult to determine time order, unsuitable for studying rare diseases, reflects determinants of survival and aetiology, unable to measure incidence, difficult to interpret, susceptible to bias
When do you use incidence rate vs incidence proportion?
Incidence rate: when interested in how fast a disease develops
Incidence proportion: when interested in what has happened by the end of the given period
What are the potential bias sources in cohort studies?
Selection bias: sampling, ascertainment, participation
Information bias: misclassification bias, ecological fallacy
Confounding
Chance
What are the advantages and disadvantages of cohort studies?
Advantages: exposure measured before disease onset, multiple outcomes per exposure including incidence, study rare exposure by selection, offer advantages less recall bias
Disadvantages: inefficient for rare diseases, prospective (expensive and time consuming), retrospective (inadequate records), validity can be affected by losses to follow-up
What is equipoise?
no existing evidence that the intervention or drug being tested will be superior to existing treatments or effective at all
What are the basic RCT design questions?
PICO: Population Intervention Comparison Outcome
What is power and significance?
Power: 1-probability of type II error (as %), usually accept 80% or 90% power
Significance: probability of type I error, usually accept 5% significance (P<0.05)
How do you calculate NNT and what is it?
1/ARR
ARR = (risk in control) - (risk in intervention)
NNT = number of patients who would have to receive treatment of interest in order to prevent adverse event in one patient
How do you calculate RRR?
RRR = ARR/risk in control
What are internal and external validity?
Internal validity: lack of bias
External validity: generalisability, explanatory trial (efficacy: can the intervention work under ideal condition), pragmatic trial (effectiveness, does the intervention work in usual clinical practice)
What is screening?
actively identifying disease, or pre-disease, in apparently healthy subjects who may benefit from early treatment
What are primary and secondary preventions and when does screening occur?
Primary prevention: healthy stage
Secondary prevention: signs and symptoms stage
Screening: at secondary stage
Which are the NHS screenings?
Cancer: breast cancer (women 50-70 every 3 years), cervical cancer, bowel cancer
Cardiovascular: aortic abdominal aneurism, diabetic retinopathy (all diabetics every year)
Antenatal and newborn: sickle cell and thalassaemia (1st trimester), down syndrome, phenolketonuria, hypothyroidism, SCD
Bowel cancer:
What are the criteria for implementing screening?
Condition, test, treatment, screening programmes, role of UK national screening committee
What is lead time?
interval between the diagnosis of a disease at screening and the usual time of diagnosis (by symptoms)
What are the biases associated with screening?
Lead time; length-biased; selection bias; overdiagnosing bias
What are the pros and cons for screening?
Pros: improved diagnosis for true positives, less radical treatment required, may be resource savings, reassurance for those with true negatives
Cons: longer period of unawareness for TP whose prognosis is unaltered, over-treatment of borderline abnormalities, false reassurance of FN, anxiety and hazard for FP, hazard of screening test to all recipients
What are sensitivity/specificity used for as compared to PPV/NPV?
sensitivity/specificity: performance/validity of diagnostic test
PPV/NPV: clinical usefulness of diagnostic test
How do you calculate positive and negative likelihood ratios?
positive = sensitivity/(1-specificity) negative = (1-specificity)/sensitivity
How are bias and precision defined in agreement between two measures of the same quantity?
Bias: mean difference with 95% CI
Precision: SD (or 95% range) of differences
How is agreement measured?
between two measurements of the same quantity is measured as departure from line of identity
What is heterogeneity?
the presence of variation in true effect sizes underlying the different studies
What is the variance of effect in a random-effects model study?
SE squared + inter-trial variance (tau squared)
How do you measure weight of a study?
Weight = 1/variance
What is tertiary prevention?
Reducing the risk of a disease or injury negatively impacting on a person’s quality of life or ability to function (cardiac rehabilitation)
How can we help people to prevent ill-health or injury?
Health protection, health improvement, prevention (preventive health services)
What are the five models of health promotion?
Medical model, behavioural model, empowerment model, social change model, environmental model
What are the 3 E’s for promoting health?
Environment, empowerment, engagement