Epidemiology and Statistics Flashcards

1
Q

What are the major epidemiological areas?

A

Descriptive, aetiological, evaluative, health services, clinical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the epidemiological measures of frequency?

A

Prevalence: number of cases/population at risk
Incidence: number of new cases/population at risk over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the epidemiological measures of association?

A

Relative risk, rate ratio, odds ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the epidemiological measures of impact?

A

Attributable risk, vaccine efficacy and effectiveness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the epidemiological measures sequence?

A

Measures of disease frequency –> measures of association (ratio and difference) –> measures of impact (AR, AR%, PAR, PAR%)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the different epidemiological study designs?

A
  1. Experimental: randomised controlled trials or non-randomised controlled trials, manipulation, control, randomisation, blinding
  2. Observational: descriptive study (no comparison group), analytical study: cohort study (exposure –> outcome), case-control study (outcome –> exposure), cross-sectional study (exposure and outcome at the same time)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the epidemiological studies errors?

A
  1. Selection bias: self-selection, nonresponse, attrition, selective survival
  2. Information bias: reporting bias, false positives/negatives, errors and omissions in medical records
  3. Confounding: difference in age, gender, health status
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the epidemiological data sources?

A
  1. Aggregate data: vital statistics, census, disease registries, monitoring systems
  2. Individual level data: vital events, disease registries, medical records, national surveys, questionnaire data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is intention-to-treat analysis?

A

The primary analysis is a direct comparison of the treatment groups and this is performed with subjects being included in the group to which they were originally allocated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is per-protocol analysis?

A

Patients are analysed according to the treatment they actually received

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the limitations of case-control studies?

A

Choice of control group affects comparison, data reported by subjects or from records - usually retrospective, so may be incomplete, inaccurate or biased

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the limitations of cohort studies?

A

Need big numbers, often need long follow-up, need to keep in touch with participants, may be expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is continuous data summarised?

A

Measures of the centre of data: mean, median

Measures of variability: standard deviation, range (min and max), interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you calculate standard deviation?

A

square root of variance; variance = (sum of squared differences between mean and each value)/(n-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which summary measure would you use for continuous data with skewed distribution?

A

Centre of distribution: consider median

Spread of data: consider interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are histograms?

A

rectangles (bins) have heights or areas which are proportional to the frequencies in each category, y scale is frequency per interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a box and whisker plot?

A

Contains: median (horizontal line in the box), upper and lower quartile, maximum (top of whisker), minimum (bottom of whisker)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are positive and negative skews?

A

Positive skew: tail on the right in longer (more common)

Negative skew: opposite (gestational age, birthweight)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which graphical methods are used to display categorical data?

A

Bar charts: each category is given its own bar along the horizontal axis (there are spaces), height of bar is proportional to the frequency or percentage of observations
Pie charts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why is it important to summarise data?

A

To monitor data quality, to check for invalid or missing data entries, to describe characteristics of participants in a study, before doing a complex analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you interpret normal distribution curves?

A

95% lies in +/- 2SD; 68% in +/-1 SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is considered a large sample?

A

For means: for a sample mean, a sample size of 100 is considered large –> sample mean follows normal distribution, smaller than this –> data needs to follow normal distribution, t distribution is used to calculate CI
For proportions: considered large if r and n-r are both greater than five, if not –> binomial CI is calculated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Which sample mean gives the most precise estimate of population mean?

A
  1. Bigger sample size

2. Smaller spread of data (SD), estimate closer to true mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How is standard error defined?

A

Indication of the extent of the sampling error; how much a sample mean tends to vary from the true population mean; it provides an estimate of the precision of the sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How is SE(mean) calculated?

A

SE(mean) = SD/square root of n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are the assumptions for calculating CI of a population mean?

A

Normal data or large sample, sample is chosen at random from population, observations are independent of each other, the sample is not small (at least 60)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are the assumptions for calculating CI of a population proportion?

A

The sample is chosen at random from the population, the observations are independent of each other, the proportion with the characteristics is not close to 0 or 1, np and n(1-p) are each greater than 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are Type I and Type II errors?

A

Type I error: getting a significant result in a sample when null hypothesis is true (false significant result), probability is 5%
Type II error: non-significant result in a sample when null hypothesis is false in population (false non-significant), probability should not be more than 20%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a P value?

A

The probability, given the null hypothesis is true, of obtaining data as extreme or more extreme; commonly, P<0.05 is statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are the factors that influence the size of the P value?

A
  1. The size of the real effect in the population sampled
  2. The sample size
  3. The variability of the measure involved
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Should we use 2-sided or 1-sided tests and why?

A

2-sided, 1-sided do not distinguish between no effect and harmful effect

32
Q

What are the assumptions of t-tests and what happens if the assumptions do not hold?

A
  1. Continuous data; normally distributed
  2. Variances (SDs are the same)
    If do not hold: transform data, t-test is less robust if variances are different as opposed to slight skewness
33
Q

When are the requirements for the data to be normal in t-tests less critical?

A

> 50 per group for 2-sample test, >100 for the paired test

34
Q

What is t in the t-test?

A

t = mean difference/SE(mean difference)

35
Q

What are the assumptions of the chi-squared test?

A

Large sample: at least 80% of expected frequencies must be greater than 5, if 2*2 all frequencies should be >5

36
Q

What is the theory of demographic transition?

A

Beginning: death rate and birth rate are high, total population is low
Death rate decreases, birth rate stays same, population increases rapidly
Death rate keeps decreasing until plateau, birth rate decreases until plateau, population plateaus

37
Q

What is population census?

A

Every 10 years, questionnaire to households, complete coverage of population; methods: enumeration districts (students listed at term time address); topics: number of persons, age and sex, household relationships, accommodation, health care, educational qualifications, cultural characteristics, employment, work place and journey to work

38
Q

Which are the hard to reach groups?

A

Disabled or elderly, ethnic minorities, faith communities, migrants, non-english speakers, unemployed, people with low income, students and other young adults, gypsies

39
Q

What are the new questions in 2011 census?

A

Main language and english language proficiency, month/year entry in the UK, intended length of stay, passport held, national identity

40
Q

What are the uses of census data?

A

define national and local populations, population estimates, population projections, demographic and social indicators (age, ethnicity, disability and deprivation)

41
Q

How are population estimates calculated?

A

Produced annually, cohort component method: take previous mid-year resident population, add age by one year, add births during the year, remove deaths, allow for migration in and out; general practice populations: problems of list inflation

42
Q

How are population projections calculated?

A

Every two years, assumptions: base population, fertility, migration and mortality

43
Q

How are indicators of social deprivation obtained?

A

Area-based measures, summarise household characteristics, Townsed and Jarman scores (old), census derived information, summary measures of affluence or social deprivation

44
Q

What are the Bradford-Hill criteria?

A

Minimal conditions necessary to provide adequate evidence of a causal relationship between an incidence and an outcome; strength of association, consistency, specificity, temporality, dose response relationship, biological plausibility, coherence, experimental evidence, analogy

45
Q

What are the advantages and disadvantages of case-control studies?

A

Advantages: inexpensive and quick, good for rare outcomes and multiple risk factors, can look at risk factors in detail
Disadvantages: not good for rare exposures, selection bias, provide no estimates of disease risk

46
Q

What are odds ratio and when are they used?

A

Case-control study; estimates ratio of disease odds in exposed group to disease odds in unexposed group in the study population

47
Q

How do admission, diagnostic, survival, non-response, recall and interviewer bias influence OR?

A

admission: increased a –> overestimation of OR
diagnostic: increased a –> overestimation of OR
survival: decreased a –> underestimation of OR
non-response: decreased d –> underestimation of OR
recall: increased a –> overestimation of OR
interviewer: increased a –> overestimation of OR

48
Q

What are misclassification errors?

A

Non-differential: random error, not a bias, weakens measure of association
Differential: systematic error, related to exposure or outcome status, bias, measure of association distorted in any direction

49
Q

How are prevalence ratio and odds ratio interpreted?

A

Prevalence ratio: IV drug users are X times more likely to be infected with HIV than non-IV drug users
Prevalence odds ratio: odds that an HIV+ person uses IV drugs in X times the odds that a HIV- person uses IV drugs

50
Q

What are the advantages and disadvantages of cross-sectional studies?

A

Advantages: quick, easy to conduct, measure prevalence for all factors, multiple outcomes and exposures, generating hypotheses
Disadvantages: difficult to determine time order, unsuitable for studying rare diseases, reflects determinants of survival and aetiology, unable to measure incidence, difficult to interpret, susceptible to bias

51
Q

When do you use incidence rate vs incidence proportion?

A

Incidence rate: when interested in how fast a disease develops
Incidence proportion: when interested in what has happened by the end of the given period

52
Q

What are the potential bias sources in cohort studies?

A

Selection bias: sampling, ascertainment, participation
Information bias: misclassification bias, ecological fallacy
Confounding
Chance

53
Q

What are the advantages and disadvantages of cohort studies?

A

Advantages: exposure measured before disease onset, multiple outcomes per exposure including incidence, study rare exposure by selection, offer advantages less recall bias
Disadvantages: inefficient for rare diseases, prospective (expensive and time consuming), retrospective (inadequate records), validity can be affected by losses to follow-up

54
Q

What is equipoise?

A

no existing evidence that the intervention or drug being tested will be superior to existing treatments or effective at all

55
Q

What are the basic RCT design questions?

A
PICO:
Population
Intervention
Comparison
Outcome
56
Q

What is power and significance?

A

Power: 1-probability of type II error (as %), usually accept 80% or 90% power
Significance: probability of type I error, usually accept 5% significance (P<0.05)

57
Q

How do you calculate NNT and what is it?

A

1/ARR
ARR = (risk in control) - (risk in intervention)
NNT = number of patients who would have to receive treatment of interest in order to prevent adverse event in one patient

58
Q

How do you calculate RRR?

A

RRR = ARR/risk in control

59
Q

What are internal and external validity?

A

Internal validity: lack of bias
External validity: generalisability, explanatory trial (efficacy: can the intervention work under ideal condition), pragmatic trial (effectiveness, does the intervention work in usual clinical practice)

60
Q

What is screening?

A

actively identifying disease, or pre-disease, in apparently healthy subjects who may benefit from early treatment

61
Q

What are primary and secondary preventions and when does screening occur?

A

Primary prevention: healthy stage
Secondary prevention: signs and symptoms stage
Screening: at secondary stage

62
Q

Which are the NHS screenings?

A

Cancer: breast cancer (women 50-70 every 3 years), cervical cancer, bowel cancer
Cardiovascular: aortic abdominal aneurism, diabetic retinopathy (all diabetics every year)
Antenatal and newborn: sickle cell and thalassaemia (1st trimester), down syndrome, phenolketonuria, hypothyroidism, SCD
Bowel cancer:

63
Q

What are the criteria for implementing screening?

A

Condition, test, treatment, screening programmes, role of UK national screening committee

64
Q

What is lead time?

A

interval between the diagnosis of a disease at screening and the usual time of diagnosis (by symptoms)

65
Q

What are the biases associated with screening?

A

Lead time; length-biased; selection bias; overdiagnosing bias

66
Q

What are the pros and cons for screening?

A

Pros: improved diagnosis for true positives, less radical treatment required, may be resource savings, reassurance for those with true negatives
Cons: longer period of unawareness for TP whose prognosis is unaltered, over-treatment of borderline abnormalities, false reassurance of FN, anxiety and hazard for FP, hazard of screening test to all recipients

67
Q

What are sensitivity/specificity used for as compared to PPV/NPV?

A

sensitivity/specificity: performance/validity of diagnostic test
PPV/NPV: clinical usefulness of diagnostic test

68
Q

How do you calculate positive and negative likelihood ratios?

A
positive = sensitivity/(1-specificity)
negative = (1-specificity)/sensitivity
69
Q

How are bias and precision defined in agreement between two measures of the same quantity?

A

Bias: mean difference with 95% CI
Precision: SD (or 95% range) of differences

70
Q

How is agreement measured?

A

between two measurements of the same quantity is measured as departure from line of identity

71
Q

What is heterogeneity?

A

the presence of variation in true effect sizes underlying the different studies

72
Q

What is the variance of effect in a random-effects model study?

A

SE squared + inter-trial variance (tau squared)

73
Q

How do you measure weight of a study?

A

Weight = 1/variance

74
Q

What is tertiary prevention?

A

Reducing the risk of a disease or injury negatively impacting on a person’s quality of life or ability to function (cardiac rehabilitation)

75
Q

How can we help people to prevent ill-health or injury?

A

Health protection, health improvement, prevention (preventive health services)

76
Q

What are the five models of health promotion?

A

Medical model, behavioural model, empowerment model, social change model, environmental model

77
Q

What are the 3 E’s for promoting health?

A

Environment, empowerment, engagement