Summarising Data Flashcards
What is an average
A single value used to describe a data set
It is a measure of central tendency
The mode median and mean are averages
What is the mode
The value which is most often
What is the median
The middle value
When the number of the data sets values are odd how do you find the median
It is the 1/2(n+1) observation
Where n is the number of values
How do you calculate the mean
Sum of x(÷n)
What is the mode in classed frequency data
The category / class with the highest frequency
How do you find the median category in frequency table data
It is the class that contains the 1/2(n+1)th value
What is the modal class
The class with the highest frequency
How do you find the median for continuous data
It is the 1/2nth value
What is linear interpolation
A method used to estimate the median value in grouped data
How do you find the median / a quartile using linear interpolation
Find where the median is in the grouped data. This is done by finding the position, and then using cumulative frequency to see which group it is in.
Find out how far into the group your median is in
Then use the formula
(Median value - cumulative frequency before ÷ change in cumulative frequency) × class width
Basically what you are doing finding how far into the group your value is, finding this in proportion to the values in your group and multiplying by the class width
How do you calculate an estimated mean from grouped data
Sum of (f×midpoint) ÷ sum of f
What happens to the averages when you increase / decreases by a set percentage
The averages increase or decrease for a set percentage
Why do we transform data
To make it easier to calculate the mean
How do you transform data with decimals
Subtract each decimal from the same integer
Then multiply them until they are whole numbers
Now calculate your average
Then reverse the calculations you did to find a mean (divide by your multiple of 10 and add your number)
What is a geometric mean
An average that multiplies all the values and roots the number
It is more accurate and effective than using an arethmatic mean
How do we calculate geometric mean
n√v¹×v²×v³ etc…
N is the number of values
The values are rooted by n
Why are weighted means used
For data with different values or weightings in each group
How are weighted means calculated
Sum of (value × weight) ÷ sum of weights
(Total wx) ÷ total w
What is a range
Largest value - smallest value
What is an interquartile range
Upper quartile - lower quartile
What are the upper and lower quartiles
Upper = 3/4 th value
Lower = 1/4 th value
How do you calculate quartiles in discrete data
The same ways as means but using the quartiles as a fraction rather than 1/2
E.g 1/4(n+1)
How do you calculate the range in a frequency table
Take the largest possible value and smallest possible value from the table
Subtract these values
How do you calculate quartiles in continuous data
The same way you would calculate the median in continuous data
E.g 1/4(n)
What are percentiles
When a data set is divided into 100 equal parts
What are deciles
When data is divided into 10 equal pwrts
What is an interdecile / interpercentile range
The difference between two percentiles or deciles
What is standard deviation
A measure of how far the values deviate from the mean value
How do you calculate standard deviation
√(sum of x^2 ÷n)-(mean)^2
Where n is the number of values
How do you calculate standard deviation for grouped data
√(sum of f×x^2 / sum of f) ÷ (sum of fx / sum of f)^2
What does a blox plot show
Maximum and minimum values
Median
Upper and lower quartiles
How do you find an outlier using quartiles
Small outlier < (lq - 1.5×iqr)
Large outlier > (uq + 1.5×iqr)
How do you find an outlier using standard deviation
Mean + / - (3×standard deviation)
What are the 3 types of skew
Symmetrical distribution - median in the centre
Positive - median closer to lower quartile
Negative - median is closer to upper quartile
What does mean>median>mode indicate
Positive skew
Mode>median>mean indicate
Negative skew
How do you calculate skew
3(mean-median) / standard deviation
Advantages and disadvantages of using the mode to show average
A:
Easy to find
Can be used with any data type
Unaffected by open-ended or extreme values
Mode is always a data accurate value
D:
Maybe no mode or multiple modes
Cannot be used to calculate a measure of spread
Advantages and disadvantages of using the median to show average
A:
Easy to calculate
Unaffected by extreme values
Best to use when data is skewed
Can be used to calculate quartiles
D:
May not be a data value
Advantages and disadvantages of using the mean to show average
A:
Uses all the data
Can be used to calculate standard deviation (statistical calculations)
D:
Always effected by extreme values
Can be distorted by open ended classes
What do you need to do when comparing data sets
You need to compare an average and a measure of spread
Also you can compare the measure of distribution
What is the distribution in a box plot like
50% of data is less than the median
50% of the data is more than the median
25% of the data is less than the lq
25% of the data is greater than the uq
50% of the data is between the quartiles
Linear interpolation example
Estimate the median amount of time spent watching tv
Median = 12th term
Median group = 10<x<=15
Cumulative frequency before the group is 10
Cumulative frequency in the group is 20
The median is found 2 into the group
LCB + amount into group / group total × class width
10+ 2/20 × 5 = 11
What does it mean to transform data
To alter data in order to make calculations easier.
Once you have finished your calculation you must re transform the data back