Statistics - Chapter 2 - Measures Of Location And Spread Flashcards
measure of location
value that describes a position in dataset.
if its the centre of the data, its the measure of central tendency
-mean, mode, median
mode/modal class
- value/class that occurs most often
- for both qualitative and quantitative data.
- with a single mode or 2 modes (bimodal).
- not very informative if each value occurs only once
?explain why a shirt manufacturer might use the mode when planning production numbers
“it provides information on the most common size or item that is in demand among customers”
?write down the modal class
“34-36”
median (Q2)
- the middle value where the data values are put in order. the middle of the data set - splits set into 2 equal (50%) halves
- for quantitative data. usually used when there’s extreme values, as they do not affect it
- for discreet values: median is (n+1)/2
- for grouped values, median is n/2
mean
x̄ = (∑ x) / n
- for quantitative data. uses all the pieces of data therefore gives a true measure of the data, but is affected by extreme values
(be specific in answer for the outliers)
“the mean is affected by the extreme value 26”
estimate the mean in a grouped freq table:
take the midpoint of each class interval and then work out normally
e.g
(30.5 x2) +(32.5 x25) /27
Quartiles
Lower quartile - Q1, 1/4 of the way through the dataset (25%)
Upper quartile - Q3, 3/4 of the way through the dataset (75%). 3/4 of n
for discrete data, you round UP the value if its a decimal and if its a whole number, its 0.5+
e.g 16/4=4 so Q1= 4.5th
56/5=11.2 so Q1=12
same goes for upper quartile too
Percentiles
-split the dataset into 100 parts.
- the 10th percentile lies 1/10 of the way through the data(10%). Written with P e.g P₁₀
- 85% of the dataset are less than the 85th percentile and 15% are greater
calculate so
85th percentile is basically 85% so
85/100 x n
interpolation
Measure of spread/dispersion/variation= measure of how spread the data is
range - difference between largest and smallest value in dataset.
interquartile range (IQR) - difference between the upper quartile and lower quartile. not affected by extreme values but only considers the spread of the middle 50% of the data
interpercentile range - difference between the values for two given percentiles. e.g 10th to 90th interpercentile range is often used since its not affected by extreme values but still considers 80% of the data in its calculation
spread of data set measure: variance, makes use that each data point deviates from the mean by the amount x - x̄
variance - msmsm
standard deviation - square root of the variance
Coding
- adding or subtracting constants doesnt affect how spread out the data is, so can ignore the constants when finding standard deviation. hence