Representing Data Flashcards
Representing centre of continuous data
- Mean
* Median
Measures of variability of continuous data
- Standard deviation (variance)
- Range (minimum, maximum)
- Interquartile range
Problems with dichotomising data
Great deal of information is discarded and statistical power is lost in the analysis
Nature of any relationships may be masked. For example, if the relationship was curved, this may be weaker if the data were categorized and if the relationship was U-shaped, categorization may totally obscure it.
Ordinal
Can be arranged from smallest to largest
Quantitative data is always ordinal
Categorical data
Fall into classes
Number of classes can differ
E.g. dead or alive = 2 classes
Cancer stage: I- IV = 4 classes
Cancer stage is not a continuous variable as the difference between two cancer stages can not be defined
Standard deviation
=spread
Measure of the average difference between the mean and each data value
Geometric mean
Calculated using log-transformed data – each data value is replaced by its logarithm to base e.
The arithmetic mean is then calculated on the new log-transformed scale and this is back-transformed using the exponential transformation to give a mean that is in the same units as the original data.
Most non-symmetrical data distributions have a positive skew, that is, the tail of the distribution is longer on the right-hand side. In such cases the arithmetic mean will be disproportionately inflated by the small number of high values in the upper tail of the distribution and so the geometric mean may be preferred.
Harmonic mean
Harmonic mean is also based on transformed data values and is the back-transformation of the arithmetic mean of the reciprocal of the data (1/value). It can be used when the data are highly positively skewed, but it is not commonly seen in practice.