Data Flashcards
What is the median?
The middle value when values are ordered from the smallest to the largest
What is the mode?
The most common value
What is the mean?
The average value: sum of all values divided by the number of values
What is the standard deviation?
This is the average distance from the mean
What is the interquartile range?
This the difference between the 75th centile and the 25th centile
What are the best values to use to avoid the influence of outliers?
If the data is not symmetrical: Should use the median rather than the mean and should use the IQR rather than the standard deviation
If the data IS symmetrical, then should use the mean and the SD
What is the Gaussian distribution?
This is a curve representing symmetrical data - this is calculated from the mean and the standard deviation
The graph is of a symmetrical bell shape curve
(The peak of the curve represents the mean)
What effect does a changing standard deviation have on the Gaussian distribution?
A change in the SD will cause the curve to become flatter and wider or thinner and taller
BUT the curves will all have the same area beneath them
What effect does a changing mean have on the Gaussian distribution?
A change in the mean will cause the shape of the curve to remain the same but the location of the curve will shift further left or right (the peak of the curve represents the mean)
Why is the Gaussian distribution useful?
A constant proportion of values will lie within any specified number of SDs above or below the mean i.e. the Gaussian distribution is symmetrical
What is the ‘reference range’?
This is the proportion of the values that are lying within the number of SDs above or below the mean e.g. if the 1.96SD = 95% range –> the reference range is the 2.5th to 97.5th centile
Commonly, the reference range lies within the 95th centile
Why are samples used to estimate data?
What is the role of confidence values?
Not practical or feesible to measure the data from every single person in the country - so use a sample instead and then use this to estimate for the entire population
The confidence value allows us to analyse to what degree we agree that the information from the sample is reliable enough to use for the whole population i.e. confidence interval tells you how accurately the sample estimates of the population values are
How can the sample size be determined from the results?
A large enough sample size result in a Gaussian distribution from the sample mean and results
What is meant by standard error?
Standard error is the standard deviation of the sample distribution - this is a measure of the statistical accuracy of an estimate
What is the standard error of the mean?
How is this calculated?
This is the standard deviation of the distribution of all possible sample means - your sample is only one sample of all potential samples which could all provide differing results so you must account for this level of error
SD/square root of the sample size