Displaying data Flashcards
Types of data
Categorical:
Binary
Ordinal
Nominal
Numerical:
Discrete
Continuous
Summarising categoric data
Proportion
Percentage
Rate
Odds
Summarising numeric data
Normal distribution, symmetric data (mean+SD)
Non-normal distribution, skewed (Median+IQR)
Minimisation steps
See confounders and which group new patient would fit in
Choose minimisation factor (e.g. 80 would mean 80% chance of being in best fit group)
Transforming data
Tukey’s ladder of transformations
For upward skew: x^1/2, log(x), -1/x, -1/x^2
To correct downward skew: x^2, x^3, antilog(x)
Back transform calculated mean and SD that used transformed data
Quantify differences between groups
Difference between 2 means only if both groups are normally distributed
Difference between 2 medians always valid
Quantifying associations between groups
Correlation coefficient Pearson’s for parametric when linear
Spearman’s is non-parametric and doesn’t need to be linear
Standard Error
Measure of precision, spread of sample means
SE=SD/n^1/2
Requirements to calculate standard error with single mean
Sample size >20
Sample normally distributed
Confidence intervals
Range of means the population is compatible with e.g. 95% CI means for 95% (sample mean ±1.96 x SD) of samples, CI range will contain population mean
Requirements to calculate standard error with diff between 2 means
Both normally distributed
Both groups sample more than 20
Similar SDs (no more than 2x other)
Standard error for a proportion
(p(1-p)/n)^1/2
Assumes n>20
Assumes 0.1< p <0.9
SE and CIs for relative risk
RR/OR transformed to normal using natural log
Confidence intervals calculated then back transformed at the very end
SE and CI of Pearson correlation coefficient
Use ln transformed scale for SE then back transform right at the end with CIs
0.5ln(1+r/1-r)
Non-parametric CI
Bootstrapping, resampling with replacement
95% range is 2.5th and 97.5th centile difference
Median from this is best estimate of population average