Stats - box and whisker plots, forest plots Flashcards
What diagram is used to display information about the range, median and the quartiles?
Box and Whisker plots
What are the 5 lines on the box plot and what do they represent (top to bottom)?
What else is on the graph?
1) highest value that isn’t an outlier
(2,3,4 make up the box)
2) Upper quartile (Q3)
3) Median (Q2)
4) Lower quartile (Q1)
5) lowest value that isn’t an outlier
Outlier dots - more than 1.5 IQR from the end of the box (important definition of outliers!)
What is the definition of an outlier on the Box and Whisker plot?
What does this correspond to?
Value more than 1.5 IQR from the end of the box
The 1.5 multiplier corresponds to approximately ±2.7SD and 99.3% coverage of the data for a normal distribution.
What is an other name for the interquartile range? How is it defined?
Mid spread - equal to the difference between the 3rd and 1st quartiles.
The median divides the data into two halves. How do you calculate the median from a set of ordered numbers?
(n+1)/2 i.e if there are 11 numbers in a set, the median is the 6th value
What are the 1st and 3rd quartile?
How are they calculated in a set of ordered numbers?
1st quartile - equivalent to the 25th percentile - (n+1)/4 i.e in a set of 11 this would be the 3rd value)
3rd quartile - equivalent to the 75th percentile - 3(n+1)/4 i.e in a set of 11 this would be the 9th value)
How do you calculate the interquartile range?
Q3 minus Q1
What is a percentile (centile)?
How does this relate to quartile data?
This is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. For example, the 20th percentile is the value (or score) below which 20% of the observations may be found.
75% of the data set is below Q3
50% of the data set is less than Q2
25% of the data set is below Q1
50% of the data set is between Q1 and Q3
What is skewing in a box and whisker plot?
If a distribution is symmetric, the observations will be evenly split at the median.
If most of the observations are concentrated on the low end of the scale, the distribution is skewed right (Plot will be higher on the left side).
And vice versa.
What is a forest plot also known as?
What is it used to display and how does this work?
Blobbogram
It is the main method for illustrating the results of a meta-analysis. It takes all the relevant studies which ask the same question, identifies a common statistic in and displays them on a single image. Doing this allows direct comparison.
What is on the horizontal axis of a forest plot?
The horizontal axis usually represents the statistic the studies being profiled show. This could either be a relative statistic like an odds ratio (OR) or a relative risk (RR). Or might be an absolute one such as Absolute Risk Reduction (ARR) or Standardised Mean Difference (SMD).
What is on the vertical axis of a forest plot?
The vertical line is known as the ‘line of null (or no) effect’. This line is placed at the value where there is no association between an exposure and outcome or no difference between two interventions (in the above example it is placed at 1). Remember that relative statistics like OR or RR have a null effect value of 1 whereas absolute statistics like Absolute Risk or ARR or SMD, have a null difference value of 0.
What 2 components are on each study line (horizontal) in a forest plot?
A point estimate of the study result (represented by a black box). The size of the boxes represents the weight or the relative contribution of each study to the overall meta-analysis. This weight is typically determined by the precision of the study’s estimate, which is often related to the sample size and the variance of the outcome.
A horizontal line representing the 95% confidence intervals of the study result, with each end of the line representing the boundaries of the confidence interval. Note which line cross the line of no effect (any study line which crosses the line of null effect does not illustrate a statistically significant result)
What does the diamond represent on a forest plot?
The diamond represents the point estimate and confidence intervals when you combine and average all the individual studies together. If you drew a vertical line through the vertical points of the diamond, that represents the point estimate of the averaged studies. The horizontal points of the diamond represent the 95% confidence interval of this combined point estimate. if the horizontal tips of the diamond cross the vertical line, the combined result is potentially not statistically significant (as you cannot be certain that the null value isn’t the true value).