Statistics - Location and spread Flashcards
What is a measure of location?
Single value which describes a position in a data set
What is a measure of central tendency?
Single value that describes the centre of the data
What is the mean?
Sum of data values / number of data values
What is the median?
The middle value when the data values are put in order
What is mode?
The value or class that occurs most often
When should median be used?
Used when there are extreme values
Quantitative data
When should mean be used?
Quantitative to represent all data
But affected by extremes
When should mode be used?
Qualitative or quantitative
Either one or two modes
Not very informative if each value occurs once
How can the mean of data in a frequency table be calculated?
Mean = Sum of products of data and their frequencies / sum of the frequencies
What is the lower quartile?
Q1
One-quarter of the way through the data set
What is the upper quartile?
Q3
Three-quarters of the way through the data set
How is data split if there is a 85th percentile?
85% of data is less than 85th
15% of data is more than 85th
How can you calculate the lower quartile for discrete data?
n/4
If whole number, Q1 is halfway between this point and the one above
If not whole number, round UP and pick this data point
How can you calculate the upper quartile for discrete data?
3n/4
If whole number, Q3 is halfway between this point and the one above
If not whole number, round UP and pick this data point
How can you calculate Q1-3 for cumulative frequency table?
Q1 = n/4 th data set
Q2 = n/2 th data set
Q3 = 3n/4 th data set
NO ROUNDING
Define percentile
The value below which a percentage of data falls
What is interpolation?
Technique to estimate the Q1-3 and percentiles
This assumes the data values are evenly distributed
What is the equation for linear interpolation?
(Quartile - freq. below / freq of group) x width + lower class boundary
What is the range?
Difference between largest and smallest values in the data set
What is the interquartile range IQR?
Difference between upper and lower quartile
Q3 - Q1
Why is IQR used?
It does not include extreme values
Only considers spread of middle 50% of the data
What is the inter-percentile range?
Difference between the values for two given percentiles
What is variance a measure of?
The spread of a data
What is the equation for variance?
(Sum of x^2/n) - (Sum of x/n)^2
What is the equation for standard deviation?
Sqrt of (Sum of x^2/n) - (Sum of x/n)^2 Square root of variance
How is variance/standard dev. different for grouped data in a frequency table?
x is always times by its frequency
What is coding?
A technique to simplify statistical calculations
Allows easier data to work with
What is the equation for coding data?
y = (x-a)/b
What is the equation for the mean of coded data?
mean of y = (mean of x - a)/b
What is the standard dev. of coded data?
Coded standard dev. = standard dev. / b
What affects measures of location and spread in coding?
Add/subtract affects mean not spread
All affected by stretch x or /
What is the formula for sample variance?
Sum of (x - mean of x)^2 / (n-1)
What is the formula for sample standard deviation?
Square root: Sum of (x - mean of x)^2 / (n-1)
What are the advantages and disadvantages of range?
Adv: Easiest measure of dispersion to calculate
Dis: Heavily affected by extreme values, no info on spread of the rest of the values
What are the advantages and disadvantages of interquartile range?
Adv: Not affected by extreme values (used when outliers present)
Dis: Difficult to calculate for grouped data
What are the advantages and disadvantages of variance?
Adv: Depends on all data values
Dis: Difficult to calculate, affected by outliers, different units from actual data values
What are the advantages and disadvantages of standard deviation?
Adv: Depends on all data values, same units as data values
Dis: difficult to calculate, affected by outliers