Interpreting Data Flashcards
What are the two main types of data?
Qualitative and quantitative
What are the two types of quantitative data?
Discrete and continuous
What are the two types of qualitative data?
Nominal (unordered) and ordinal (ordered)
What is nominal data split into?
Binary and categorical
What is the median?
Middle value when values ordered from smallest to largest
What is the median?
2, 3, 6, 7, 10, 11, 14
7
What is the mode?
Most common value
What is the mean?
The average. It is the sum of all the values divided by the number of values.
Calculate the mean.
2, 3, 4, 7, 8, 8, 11
6.1
What does standard deviation mean?
The average distance from the mean
How is standard deviation calculated?
The sum of (each individual value - mean) squared, then divided by the number of values. Then you square root this answer.
What centile is the median?
50th
What is the interquartile range?
25th to 75th centile
When is it better to use a median rather than a mean?
To avoid the influence of outliers, i.e. if there is an outlier that is very different to the rest of the data.
When is it better to use IQR rather than the standard deviation?
To avoid the influence of outliers
What is the Gaussian distribution determined by?
Mean and standard deviation
If the mean is reduced from 120 to 110, what happens to the Gaussian distribution?
It shifts to the left.
If the mean is increased from 120 to 130, what happens to the Gaussian distribution?
It shifts to the right.
What happens to the Gaussian distribution if the standard deviation is decreased from 15 to 10?
The curve becomes narrower and taller
What happens to the Gaussian distribution if the standard deviation is increased from 15 to 20?
The curve becomes wider and flatter
What is a useful property of Gaussian distributions?
A constant proportion of values will lie within any specified number of Standard Deviations above or below the mean (reference ranges).
If you go one standard deviation away from the mean, how many % does this represent?
68%
If you go 1.64 standard deviations away from the mean, how many % does this represent?
90%
If you go 1.96 standard deviations away from the mean, how many % does this represent?
95%
What is the 99% range? How is it calculated?
0.5th centile to 99.5th centile
Mean +/- 2.58 SDs
What is the 95% range? How is it calculated?
2.5th centile to 97.5th centile
Mean +/- 1.96 SDs
What is the 90% range? How is it calculated?
5th centile to 95th centile
Mean +/- 1.64 SDs
If the sample size isn’t too small then the distribution of the sample mean will be…?
Gaussian
What is the standard error?
The standard deviation of this distribution (Gaussian) is called the standard error. It is a measure of the statistical accuracy of an estimate.
What is the standard error of the mean?
The standard deviation of the distribution of all possible sample means – can’t do this in practice, so it is estimated.
How is standard error of the mean estimated?
Standard deviation divided by the square root of the sample size.
How is the 95% confidence interval of a sample mean calculated?
95% CI = sample mean +/- (1.96 x standard error)
What does the 95% confidence interval mean?
We would expect 95% of samples of the same size to have a mean between the two values calculated.
In the population we are 95% sure that the mean could be as low as ___ or as high as ___.
When calculating confidence intervals and ranges, what should be used for each?
Standard deviation for ranges
Standard error for intervals
When the sample size increases, the 95% range…
Stays the same
When the sample size increases, the 95% confidence interval…
Gets narrower
What is ‘r’? What two values is it always between?
Correlation coefficient
-1 and 1
What does r=1 tell you?
Perfect positive correlation
What does r=-1 tell you?
Perfect negative correlation
What does r=0 tell you?
No correlation
What is the equation for a linear regression?
y = a + bx,
where y is the outcome and x is the predictor
What does the line of best fit do?
Minimises square of vertical distances
Regression - whatever we are predicting, should it be on the vertical or horizontal axis?
Vertical
Statistical significance - what does this mean and how is it determined?
An observed sample difference between groups might be due to chance. Statistically significant means the result is unlikely to be due to chance.
Use confidence intervals and p-values
What does a p-value mean?
A p-value for a result is the probability of observing a result as or more extreme than the sample result if the underlying assumption in the population is true.
What does the p-value have to be less than to be statistically significant?
<0.05
When can p-values be calculated?
When there is a comparison:
2 means – are they different i.e. is their difference different from 0?
Association – are the observed results different from those expected
Regression – is the slope different from 0?
How are p-values calculated?
Using chi-squared test
If the 95% CI for a difference excludes 0 then what can be said about the p-value?
p<0.05
If the 95% CI for a difference contains 0 then what can be said about the p-value?
p≥0.05