Summarising data Flashcards
What are the two types of variables?
Categorical
Numerical
What is a categorical variable?
Give 3 subtypes:
Non-numerical (each value is associated with a catgory e.g.
- Ordered categorical (ordinal) e.g. social class = assigned a number
- Unordered categorical (nominal) e.g. blood group = named group
- Dichotomous/binary e.g. gender = two categories only
Give two subtypes of numerical variable:
- Continuous e.g. height = infinite no. of distinct values
- Discrete/counts e.g. number of siblings = only specific no of variables
What type of variable is severity (low, moderate or severe) of dental erosion?
Ordinal
What type of study is ALSPAC (avon longitudinal study of parents and children)?
Prospective cohort study
Recruit pregnant women living in Avon with a due date between April 1991 and Dec 1992
How can one categorical variable be shown?
Bar chart
Pie chart
Frequency table
How can one continuous variable be presented?
Histogram
Bar chart
Pie chart
How can a categorical outcome and categorical exposure be presented?
Contingency (2 way) table
Outcome = columns
Exposure = rows]
Each cell usually shows count and % within exposure = gives some idea of relationship between outcome and exposure
How can a numerical outcome and categorical exposure be presented?
Box and whisker blot (n.b. whiskers are used by diff people to represent different measures
Can compare distributions in >2 groups
How can a numerical outcome and numerical exposure be presented?
Scatter plot (has regression line)
What is the most appropriate graph for displaying adult height according to social class?
Box and whisker plot
(adult height = outcome = continuous; social class - exposure = categorical)
What are the 3 different measures of central tendancy?
Mean
Median (more useful if there are extreme/outlying values or data is not symmetrically distributed)
Mode (depends on precision of data -> if sufficiently precise each reading can be distinguished from the other = mode wont exist)
N.B. these vary!
What are the 4 measures of variability (extent of spread around the centre)?
Range (depends soley on two extrene values = may be inrepresentative of the whole set)
Interquartile range (often more useful if use the median instead of the mean)
Standard deviation (must be approximately symmetrically distributed to be meaningful)
Variance (SD2)
What are the 3 types of distributions?
Normal (symmetrical)
Positively skewed (long tail to right)
Negativiely skewed (long tail to left)
What do we do to positively skewed data to convert it to approximate normality?
Log transformation (must remember to transfer any means and SD back to the origional units for comparison)
n.b. other transformations may be required for negitively skewed data
What is the geometric mean?
The exponential of mean value calculated from logged data
What is a normal distribution?
95% of observations enclosed within mean +/- 1.96 SD
Mean & median = identical
SD determines shape (small = tall and narrow, large = shorter and fatter)
What is a reference range?
A further measure of variability = amount of variation between individual observations of dae = used in clinic to determine if patient is clinically normal or not
e.g. 95% reference range = mean +/- 1.96 SD
Can also have 90 & 99% reference ranges etc.
Defined using properties of normal distribution
The shape of curve for a normal distribution has a large standard deviation is ________ than one with a small standard deviation
Shorter/wider