Week 9 Reading: Measuring and Summarising Data - Ben-Schlomo, Brookes Flashcards
Medical Variable Types
2
- Numerical variables
- Continuous
- Discrete
- Categorical variables
- Ordered
- Unordered
Numerical Variable Types
2
Continuous = measurements on a continous scale
Discrete = counts
generally treated same way
Categorical Variables
=, 2
= variables that take nonnumerical values and refer to categories of data
- Unordered = class observations into named groups
- Ordered
Continuous Numerical Variables
= measurements on a continuous scale
e.g. height, haemoglobin, systolic blood pressure
Discrete numerical variables
= counts
e.g. no. children in a family, no. asthma attacks in a week
Unordered Categorical Variables
= class observations into named groups
e.g. ethnic group, marital status, disease categories
Binary/dichotomous = special case, class observations into 2 groups usually indicating presence or absence of a characteristic
Ordered Categorical Variables
= rank observations according to an ordered classification
e.g. social class, severity of disease (mild, moderate, severe), stages in development of cancer
often in epidemiological studies a variable is measured as numerical and then categorised
e.g. height measured then <5ft, 5ft-5ft 5in, 5ft 5in-6ft, >6ft
Binary/Dichotomous Unordered Categorical Variables
= special case of unordered categorical variables classing observations into 2 groups, generally indicating presence or absence of a charecteristic
e.g. presence vs absence of chest pain, smoker vs non-smoker, vaccinated vs unvaccinated
Measures of Central Tendency
=, 3, sub 3
- Mean = sum of all values in a set divided by no. values
- Median = middle value when set arranged in order. If even no., take mean of 2 middle values
- Mode = most frequently occuring value/peak on frequency distribution histogram
- Unimodal = single mode/peak
- Bimodal = 2 modes/peaks
- Multimodal = >1 mode/peak
Measures of Variability
=, 3
Variability = extent to which values of a variable in a distribution are spread
1. Range = difference between largest and smallest values
2. Interquartile range = range between quartiles
- Quantiles = divisions of set of values into equal, ordered subgroups
- can have tertiles, quartiles, quintiles, deciles, centiles etc.
3. Standard Deviation (SD) = spread of observations about the mean, based on differences/deviations from mean
- differences are squared to remove effect of sign
- SD is calculated as square root of sum of squared deviations divided by no. deviations minus 1
- SD squared = variance
Normal/Gaussian Distribution
- mean, median and mode aree identical, define location of curve
- SD determines shape of curve
- Small SD –> tall, narrow
- Large SD –> short, wide
- use mean and SD to determine proportion of data lying between 2 variables, rules apply regardless of values of mean and SD:
1. - 68.3% lie within 1 SD of mean- 95.4% lie within 2 SD of mean
- 99.7% lie within 3 SD of mean
- Because of symmetry:
- 15.85% lie above 1 SD above mean or below 1 SD below mean
- 2.3% lie above 2 SD above mean or below 2 SD below mean
- 95.0% observations enclosed between mean - 1.96 x SD to mean + 1.96 x SD
Case Series
= describing frequency of characteristics in a patient sample
Proportion
=, 2
= (number with disease)/(total number)
can be x 100 to make it a percentage
- Prevalence = proportion (or %) with disease at a particular point in time
- Cumulative incidence/Risk = proportion (or %) of new cases of disease occuring in a specified time period
Prevalence
=, =
= proportion (or %) with disease at a particular point in time
Prevalence = (no. with disease at particular time)/(total no. population at that time)
Cumulative Incidence/Risk
=, =
= proportion (or %) of new cases of disease occuring in a specified time period
Risk = (no. new cases in a period)/(no. initially free of disease)
Incidence
=, =
= how fast new cases are occuring
Incidence rate = (no. new cases)/(total no. x time interval)
TPP
=, 2
= Time Place Person, how epidemiologists describe disease patterns
useful for:
- planning healthcare services
- generating aetiological hypotheses
Potential Explanations for Increased or Decreased Risk
6
- Chance = random fluctuations
- Ascertainment = change in diagnostic techniques
- Demography = change in age distribution of population
- Coding = changes in rules by which mortality is coded (ICD). Demonstrate with bridge coding = comparing new rates using old rules
- Treatment effects = new medical therapies can increase or decrease disease frequency or mortality
- True changes in incidence = true increase or decrease in incidence, implies risk factors
Bridge Coding
= comparing new rates using old coding rules
Null hypothesis
= assumption of no association between disease and outcome
Difference in Means
=, =
= measure the association between and exposure and outcome when the exposure is dichotomous/binary and the outcome numerical
difference in means = mean in exposed - mean in unexposed
Risk Difference/Attributable Risk
=, =, +, 0, -
= measure the association between exposure and outcome when both are dichotomous/binary
risk difference = risk among exposed - risk among unexposed
+ve value indicates increased risk
0 indicates no difference
-ve indicates reduced risk
Absolute measures of association
3
- difference in means
- risk difference/attributable risk
- population attributable risk
have units
Population Attributable Risk
=, =
= measures how much of overall population risk is attributable to an exposure
population attributable risk = overall risk - risk among unexposed
Relative Measures of Association
4
- Risk ratio/relative risk
- Odds of disease
- Odds ratio
- Hazard Ratio
unitless
Risk Ratio/Relative Risk
=, =, 3
= how much more likely the outcome is among those exposed compared to unexposed, used when both outcome and exposure are dichotomous/binary
risk ratio = (risk in exposed individuals)/(risk in unexposed individuals)
- >1 = increased risk
- 1 = no difference in risk
- <1 = reduced risk
Odds of Disease
=
odds of disease = (no. with disease)/(no. without disease)
Odds Ratio
odds ratio = (odds of disease in exposed individuals)/(odds of disease in unexposed individuals)
= (odds of exposure in individuals with disease)/(odds of exposure in individuals without disease)
= (d1/d0)/(h1/h0)
= (d1 x h0)/(d0 x h1)
where d1 = no. exposed in disease group, d0 = no. unexposed in disease group, h1 = no. exposed in healthy group, h0 = no. unexposed in heathy group
second form of odds ratio is used in case-control studies
Hazard ratio
=
= used for time to event data (as in survival analysis)
Reference Range
= determines the proportion of data lying between any 2 values