Week 9 Reading: Measuring and Summarising Data - Ben-Schlomo, Brookes Flashcards
Medical Variable Types
2
- Numerical variables
- Continuous
- Discrete
- Categorical variables
- Ordered
- Unordered
Numerical Variable Types
2
Continuous = measurements on a continous scale
Discrete = counts
generally treated same way
Categorical Variables
=, 2
= variables that take nonnumerical values and refer to categories of data
- Unordered = class observations into named groups
- Ordered
Continuous Numerical Variables
= measurements on a continuous scale
e.g. height, haemoglobin, systolic blood pressure
Discrete numerical variables
= counts
e.g. no. children in a family, no. asthma attacks in a week
Unordered Categorical Variables
= class observations into named groups
e.g. ethnic group, marital status, disease categories
Binary/dichotomous = special case, class observations into 2 groups usually indicating presence or absence of a characteristic
Ordered Categorical Variables
= rank observations according to an ordered classification
e.g. social class, severity of disease (mild, moderate, severe), stages in development of cancer
often in epidemiological studies a variable is measured as numerical and then categorised
e.g. height measured then <5ft, 5ft-5ft 5in, 5ft 5in-6ft, >6ft
Binary/Dichotomous Unordered Categorical Variables
= special case of unordered categorical variables classing observations into 2 groups, generally indicating presence or absence of a charecteristic
e.g. presence vs absence of chest pain, smoker vs non-smoker, vaccinated vs unvaccinated
Measures of Central Tendency
=, 3, sub 3
- Mean = sum of all values in a set divided by no. values
- Median = middle value when set arranged in order. If even no., take mean of 2 middle values
- Mode = most frequently occuring value/peak on frequency distribution histogram
- Unimodal = single mode/peak
- Bimodal = 2 modes/peaks
- Multimodal = >1 mode/peak
Measures of Variability
=, 3
Variability = extent to which values of a variable in a distribution are spread
1. Range = difference between largest and smallest values
2. Interquartile range = range between quartiles
- Quantiles = divisions of set of values into equal, ordered subgroups
- can have tertiles, quartiles, quintiles, deciles, centiles etc.
3. Standard Deviation (SD) = spread of observations about the mean, based on differences/deviations from mean
- differences are squared to remove effect of sign
- SD is calculated as square root of sum of squared deviations divided by no. deviations minus 1
- SD squared = variance
Normal/Gaussian Distribution
- mean, median and mode aree identical, define location of curve
- SD determines shape of curve
- Small SD –> tall, narrow
- Large SD –> short, wide
- use mean and SD to determine proportion of data lying between 2 variables, rules apply regardless of values of mean and SD:
1. - 68.3% lie within 1 SD of mean- 95.4% lie within 2 SD of mean
- 99.7% lie within 3 SD of mean
- Because of symmetry:
- 15.85% lie above 1 SD above mean or below 1 SD below mean
- 2.3% lie above 2 SD above mean or below 2 SD below mean
- 95.0% observations enclosed between mean - 1.96 x SD to mean + 1.96 x SD
Case Series
= describing frequency of characteristics in a patient sample
Proportion
=, 2
= (number with disease)/(total number)
can be x 100 to make it a percentage
- Prevalence = proportion (or %) with disease at a particular point in time
- Cumulative incidence/Risk = proportion (or %) of new cases of disease occuring in a specified time period
Prevalence
=, =
= proportion (or %) with disease at a particular point in time
Prevalence = (no. with disease at particular time)/(total no. population at that time)
Cumulative Incidence/Risk
=, =
= proportion (or %) of new cases of disease occuring in a specified time period
Risk = (no. new cases in a period)/(no. initially free of disease)