1. Interpreting Data Flashcards
How do you calculate the standard deviation?
Square root the average squared distance from the mean.
What is the interquartile range?
When would you use IQR over standard deviation?
When would you use the mean or the median?
How should the following data be summarised (pic):
A. Median and standard deviation
B. Mean and interquartile range
C. Mean and standard deviation
D. Median and interquartile range
25th to 75th centile
If you have outliers
Median if you have outliers, otherwise either will do
D (always use IQR with median)
How should this data be summarised?
A. Median and standard deviation
B. Mean and interquartile range
C. Mean and standard deviation
D. Median and interquartile range
What is this distribution called?
C
Gaussian
In a Gaussian distribution (pic) what happens if you change age to include older/younger people?
Older = curve shifts R, but shape stays same. Younger = shifts L
In a Gaussian distribution (pic) what happens if you change the standard deviation (decrease/increase)?
If SD decreases = curve flatter and wider but centre point ad area underneath always the same. If SD increases = taller and narrower
What are the 3 reference ranges that Gaussian distributions can be used to show?
What is the standard error?
What is the standard error of the mean?
99% range (0.5th to 99.5th centile) = mean ± 2.58 SDs
95% range (2.5th to 97.5th centile) = mean ± 1.96 SDs
90% range (5th to 95th centile) = mean ± 1.64 SDs
Ranges get narrower as go down
The standard deviation of the Gaussian distribution; it’s a measure of the statistiacal accuracy of an estimate.
The standard deviation of the distribution of all possible sample means. Estimated from a single sample as:
Standard error of the mean = standard deviation / √sample size
How do you calculate the 95% confidence interval (CI) of a sample mean?
If the 95% CI for BMI was 21.4 - 22.6, what 2 ways could you describe the results?
95% CI = sample mean ± 1.96 x standard error
- We would expect 95% of samples of the same size to have a mean BMI between 21.4 and 22.6.
- In the population we are 95% sure that the mean BMI could be as low as 21.4 or as high as 22.6
95% confidence interval for the mean weight of a sample of 30 adult men is 75kg to 81kg. Which is the correct definition?
A. In the population we are 95% sure that the mean weight could be as low as 75kg or as high as 81kg
B. In the population the mean weight will be between 75kg and 81kg
C. In the population 95% of men will weigh between 75kg and 81kg
D. In this study 95% of men weighed between 75kg and 81kg
A
(B is just the 95% range, not the CI)
When do you use standard deviation or standard error?
What happens to the 95% range as the sample size increases (in pic)?
What happens to the 95% CI?
Use SD for ranges (for individual values) and SE for CIs (for means)
Stays the same
Gets narrower (because calculating it using SE, and SE gets smaller as sample size increases b/c dividing by the square root of the sample size)
How would you describe this relationship?
What is r?
Birth weight is positively correlated with gestational age.
Correlation coefficient, always between -1 and 1
What would the r value be in A - C?
How is linear regression represented?
What is the equation?
A) r = 0, no correlation
B) r = 1, perfect positive correlation
C) r = -1, perfect negative correlation
Line of best fit
y = a + bx
(y = outcome/dependant variable, x = predictor/independant variable, b = diff in y/diff in x), a = if line was continued, where it crosses the y axis when x = 0)
Predicting gestational age from crown rump length.
Which regression should you be doing?
A (whatever we’re predicting on the vertical axis, and what we’re using to predict on the horizontal axis)
Predicting PAPP-A from gestational age
Which regression should you be doing?
B
What 2 things would you look at to determine whether an observed difference was due to chance or statistically significant?
CIs and p-values
What is the p-value?
Using the data (pic), how would you calculate the probability of observing at least 17 heads or at least 17 tails (the p-value)? We don’t know which side the coin is biased to.
What does it mean if the p-value is <0.05?
When can p-values be calculated?
The probability of observing a result as or more extreme than the sample result if the underlting assumption in the population is true.
0.008 + 0.008 = 0.016 (This is a two-tailed p-value; if we thought it was biased to heads it’d just be 0.008 = one-tailed p-value)
Statistical significance. If >0.05, can’t rule out chance effect.
2 means (are they different), association (are observed diff from expected results?), regression (is slope diff from 0?)