Epidemiology and Statistics Flashcards
What are the major epidemiological areas?
Descriptive, aetiological, evaluative, health services, clinical
What are the epidemiological measures of frequency?
Prevalence: number of cases/population at risk
Incidence: number of new cases/population at risk over time
What are the epidemiological measures of association?
Relative risk, rate ratio, odds ratio
What are the epidemiological measures of impact?
Attributable risk, vaccine efficacy and effectiveness
What is the epidemiological measures sequence?
Measures of disease frequency –> measures of association (ratio and difference) –> measures of impact (AR, AR%, PAR, PAR%)
What are the different epidemiological study designs?
- Experimental: randomised controlled trials or non-randomised controlled trials, manipulation, control, randomisation, blinding
- Observational: descriptive study (no comparison group), analytical study: cohort study (exposure –> outcome), case-control study (outcome –> exposure), cross-sectional study (exposure and outcome at the same time)
What are the epidemiological studies errors?
- Selection bias: self-selection, nonresponse, attrition, selective survival
- Information bias: reporting bias, false positives/negatives, errors and omissions in medical records
- Confounding: difference in age, gender, health status
What are the epidemiological data sources?
- Aggregate data: vital statistics, census, disease registries, monitoring systems
- Individual level data: vital events, disease registries, medical records, national surveys, questionnaire data
What is intention-to-treat analysis?
The primary analysis is a direct comparison of the treatment groups and this is performed with subjects being included in the group to which they were originally allocated
What is per-protocol analysis?
Patients are analysed according to the treatment they actually received
What are the limitations of case-control studies?
Choice of control group affects comparison, data reported by subjects or from records - usually retrospective, so may be incomplete, inaccurate or biased
What are the limitations of cohort studies?
Need big numbers, often need long follow-up, need to keep in touch with participants, may be expensive
How is continuous data summarised?
Measures of the centre of data: mean, median
Measures of variability: standard deviation, range (min and max), interquartile range
How do you calculate standard deviation?
square root of variance; variance = (sum of squared differences between mean and each value)/(n-1)
Which summary measure would you use for continuous data with skewed distribution?
Centre of distribution: consider median
Spread of data: consider interquartile range
What are histograms?
rectangles (bins) have heights or areas which are proportional to the frequencies in each category, y scale is frequency per interval
What is a box and whisker plot?
Contains: median (horizontal line in the box), upper and lower quartile, maximum (top of whisker), minimum (bottom of whisker)
What are positive and negative skews?
Positive skew: tail on the right in longer (more common)
Negative skew: opposite (gestational age, birthweight)
Which graphical methods are used to display categorical data?
Bar charts: each category is given its own bar along the horizontal axis (there are spaces), height of bar is proportional to the frequency or percentage of observations
Pie charts
Why is it important to summarise data?
To monitor data quality, to check for invalid or missing data entries, to describe characteristics of participants in a study, before doing a complex analysis
How do you interpret normal distribution curves?
95% lies in +/- 2SD; 68% in +/-1 SD
What is considered a large sample?
For means: for a sample mean, a sample size of 100 is considered large –> sample mean follows normal distribution, smaller than this –> data needs to follow normal distribution, t distribution is used to calculate CI
For proportions: considered large if r and n-r are both greater than five, if not –> binomial CI is calculated
Which sample mean gives the most precise estimate of population mean?
- Bigger sample size
2. Smaller spread of data (SD), estimate closer to true mean
How is standard error defined?
Indication of the extent of the sampling error; how much a sample mean tends to vary from the true population mean; it provides an estimate of the precision of the sample mean
How is SE(mean) calculated?
SE(mean) = SD/square root of n
What are the assumptions for calculating CI of a population mean?
Normal data or large sample, sample is chosen at random from population, observations are independent of each other, the sample is not small (at least 60)
What are the assumptions for calculating CI of a population proportion?
The sample is chosen at random from the population, the observations are independent of each other, the proportion with the characteristics is not close to 0 or 1, np and n(1-p) are each greater than 5
What are Type I and Type II errors?
Type I error: getting a significant result in a sample when null hypothesis is true (false significant result), probability is 5%
Type II error: non-significant result in a sample when null hypothesis is false in population (false non-significant), probability should not be more than 20%
What is a P value?
The probability, given the null hypothesis is true, of obtaining data as extreme or more extreme; commonly, P<0.05 is statistically significant
What are the factors that influence the size of the P value?
- The size of the real effect in the population sampled
- The sample size
- The variability of the measure involved