Numerical Measures Flashcards
What is the formula for the mean of a data set?
(Σx)/n where x represents the values of data and n is the number of values
How is the median value found for non-grouped data?
(n+1)/2
How is the median value found for grouped data?
n/2
(If you get an nth value ending in .5, work out the mean between the value in front and value behind to get the value the median corresponds too)
What is the mode?
The mode is the most common value in a data set.
Note for grouped data, there is a modal class. Which is defined as the class in which the modal value is contained.
Also note, not all samples have a mode.
When working out the mean, median, mode or quartiles what information do you need?
You need a cumulative frequency column
What is the range?
The range is the difference between the highest and lowest value in a data set
What is the lower quartile?
The data at the 25th percentile of the sample.
For non-grouped data, the nth value that represents the lower quartile is found by 0.25(n+1) where n is the cumulative frequency
For grouped data, the nth value that represents the lower quartile is found by 0.25(n) where n is the cumulative frequency
(if you get an nth value ending in .5 work out the mean between the value in front and the value behind to get the value the lower quartile corresponds to)
What is the interquartile range?
The difference between the upper and lower quartiles
(Q3 - Q1)
(this is a value)
What is the upper quartile?
The data value at the 75th percentile of the sample.
For non grouped data, the nth value that represents the upper quartile is found by 0.75(n+1) where n is the cumulative frequency
For grouped data, the nth value that represents the upper quartile is found by 0.75(n) where n is the cumulative frequency
(if you get an nth value ending in .5 work out the mean between the value in front and the value behind to get the value the upper quartile corresponds to)
How are variance and standard deviation related?
Variance = Standard Deviation 2
What actually is variance?
A measure of how far each data point squared is from the mean, and therefore represents the spread of the data
How is variance found?
- Find the mean of the data points
- Calculate the difference between each data point and the mean value (write this as a new list of values)
- Square the difference between each data point and the mean
- Find the sum of your new list of values
- Write the final answer as the relevant unit squared
How do you find the variance, SD, median and quartiles with your calculator?
- MENU (6)
- 1-Variable (1)
- Enter data and frequency
- AC
- OPTN
- 1-Variable Calc (2)
For grouped data, find the midpoint of the class and put it into the calculator
Note do not use cumulative frequency in the calculator
How do you deal with grouped data when inputting into the calculator to find numerical measures?
Use the midpoint of the data as the value to input
What is grouped and non-grouped data?
Grouped data refers to data given in class intervals (e.g 10-20)
Non-grouped data refers to individual pieces of data (e.g 6,24,69,420)
How can you convert grouped data into non-grouped data?
Write out the heading of the group as many times as the frequency states
(e.g a group of 3 people with 4 cats each,
becomes, 4,4,4)
Note this also works in reverse
When there are gaps in a continuous grouped data set (lengths 0-9, 10-19, 20-29), what do you always do first?
Adjust class widths to the value for which they would no longer round to the original values
(0-9, 10-19) becomes (0-9.5, 9.5-10.5)
Then find the midpoint column
When there are gaps in discrete grouped data sets (ages 0-5, 6-10, 11-15), what do you always do first?
Adjust class widths so that the final value of the width is the first value of the next width
(0-5, 6-10 11-15 … ) becomes (0-6, 6-11, 11-16 etc)
Then find the midpoint column
What is continuous data?
Data which can take up any value (e.g girth, length and height)
What is discrete data?
Data which can be counted and has finite values (e.g sausages, boys and pens)
What is the ‘formula’ for linear interpolation?
(UB-LB)/(UF-LF) = (Q-LB)/(N-LF)
This basically states the proportion of the boundaries range to frequency range is the same as the proportion of the median - lowest boundary value to the median - lowest frequency
State an assumption of linear interpolation
Data is evenly spread within the boundaries
How do you find ‘N’ in linear interpolation?
For median n is the (cumulative frequency / 2)
For LQ n is the (cumulative frequency / 4)
For UQ n is 3 x (cumulative frequency / 4)
What are the steps of linear interpolation?
– Adjust class widths of grouped data for any gaps
– Add a cumulative frequency column
– Input the cumulative frequency as n and sub into relevant equation (median / quartile)
– Find the class in which this value for n falls
– Draw interpolation diagram
– Find UB and LB by reading class width
– Find UF and LF by finding cumulative frequency on either side of the class
– Sub these values including N into the equation and solve for Q
What is the equation for standard deviation as coded data?
Sy = Sx / b
Where y is coded data and x is the original data where b is a constant
What are the advantages and disadvantages of using the median as a measure of location?
Advantages:
- Useful for non-numerical data
- Always an observed data value
Disadvantages:
- Affected by an outlier
- Does not use all data
How do you draw a linear interpolation diagram?
- Draw a horizontal straight line
-Draw 3 vertical lines at the top, bottom and middle of your line
- Write upper and lower boundaries on the top as well as Q
- Write upper and lower frequencies on the bottom as well as the value of n (calculated by frequency equation initially)
- Solve for Q using the equation
True or False data given in linear interpolation questions the data given is always grouped
True you will never be given non grouped data
What is the equation for variance?
Sxx/n
or
((Σx2)/n) - x̄2)
What is the equation for standard deviation?
(Sxx/n)1/2
(((Σx2)/n) - x̄2))1/2
What is the general equation for coded data?
y = (x-a) / b
where y is the coded data value, x is the original data and a and b are constants
What is the equation to find the coded mean?
ȳ = (x̄ - a)/b
Where y is the coded mean and x is the original mean and a and b are constants
What are the advantages and disadvantages of using the mode as a measure of location?
Advantages:
- Not affected by an outlier
- Useful for non-numerical data
Disadvantages:
- Does not use all data
- May be multiple modes
What are advantages and disadvantages of using the mean as a measure of location?
Advantages:
- Large data set makes outliers negligible
- Uses all data values
Disadvantages:
- Affected by outliers in small data sets
When you have discrete data with gaps do you amend the gaps or not?
You do not amend the gaps. You only amend gaps in continuous grouped data
What are the advantages and disadvantages of using the range as a measure of spread?
Advantages:
- Reflects the full data set
Disadvantages:
- Affected by outliers
What are the advantages and disadvantages of using the interquartile range as a measure of spread?
Advantages:
- Not affected by outliers
Disadvantages:
- Does not reflect the full data set
What are the advantages and disadvantages of using the standard deviation as a measure of spread?
Advantages:
- Outliers are negligible in large data sets
Disadvantages:
- Outliers have a big impact on small data sets
What does sigma (Σ) notation express?
Sigma (Σ) refers to the ‘sum of’
For example, sigma x (Σx) means the sum of all the values of x
What does standard deviation actually mean?
A measure of how far each data point is from the mean, and therefore represents the spread of the data
Does addition/subtraction (when coding data) affect the mean and standard deviation? (skip this card)
Adding or subtracting will affect the mean of the data but not the standard deviation. This is because all data points have increased/decreased by the same value and so the distance from the mean is no different.
The mean will change by the same value as the addition or subtraction
Does multiplication/division (when coding data) affect the mean and standard deviation?
Multiplying or dividing affects both the mean and standard deviation
The mean will change by the same factor as the division or multiplication
What is the mean?
The mean is the sum of all data divided by the number of pieces of data
It is calculated in the same way for grouped and non-grouped data.
What is the equation for standard deviation?
(Sxx/n)1/2