Lecture 4-Measures of Central Tendency and Dispersion Flashcards
Where is the dataset centred?
Measures of central tendency
How dispersed is the dataset about its centre?
Measures of central dispersion
A measure of central tendency for a dataset
a number that indicates the ‘centre’ of the distribution of the dataset.
Three main measures of central tendency
1.Mean 2.Mode 3.Median and TriMean (not main)
The Mean
Ideally, we should be able to summarize all the facts about a group of data in one figure
*In fact, this is frequently done by a measure that is at the core of a data set - the average.
*But what is an average?
*In all cases, what we are really aiming at is some notion of a typical value. You may be already familiar with the most popular of average measures - the arithmetic mean.
For grouped data, we compute the mean from the FREQUENCY TABLE /DISTRIBUTION
*Assume that all the datapoints that fall in a class are centred at the class mark
*For each class, find the product of the class mark and the class frequency
*The Mean is found by adding these products and dividing the sum by the total frequency
The Mean example
Let us denote the amount of runs that the batsman makes in any one innings as X. Better yet, let us define as X1 the amount he makes in the first innings, X2 what he makes in the second, X5 what he makes in the fifth and so on.
*Generically, we may let Xi stand for the amount he scores in the ith innings, where i can be made equal to 1, or 2, or 5 or n!
–For three innings, the arithmetic mean of his scores would be:
(X1 + X2 + X3) / 3
–For five innings it would be:
(X1 + X2 + X3 + X4 + X5) / 5
The mean from grouped data
Freq f
10-14
2
15 – 19
12
20- 24
23
25-29
60
30-34
77
35-39
38
40 - 44
8
220
8
*Consider how to find a mean value from a grouped data set
*We have 220 datapoints . To find the mean, we want to add them up and divide by 220 to find the mean
*HOWEVER, individual values are lost in a frequency distribution!
*What to do?
Mean pt 2
Assume that all the datapoints that fall in a class are centred at the class mark
*So, we know 2 datapoints fell within the 10-14 class, but that is all
*We find the mid-point of the class 10+14 / 2 = 12
*We assume the values of those 2 datapoints to be 12
*In reality, they may not be this at all! But this is the price we pay with grouped data
So, the first two datapoints are 12
*To begin calculating a mean, we would start with 12+12+…
*In other words, we multiply the frequency of each class by the mid-point
*The Mean = 6560/220 = 29.8
*Note that, if we were able to use the actual datapoints, the value we derive for the Mean may well be different
Define Tri-Mean
If the trimmed mean does not differ considerably from the mean, then we know that the extreme values of the data did not significantly bias the mean calculation
*If they differ, however, then we know that our data set was characterized by untypical, extreme values
*If left unrecognised, this fact could lead us to draw to some erroneous conclusions by relying on the Mean alone
Tri-mean examples
Scores of a student over 20 courses arranged in ascending
order as follows:
0, 89, 89, 89, 89, 89, 89, 90, 90, 90,
90, 90, 90, 90, 91, 91, 91, 91, 91, 91
*Mean of all 20 scores = 85.5 (not a typical score)
Quick inspection of the data reveals:
* Scores typically between 89 and 91
*‘0’ is not typical; it is an outlier
*If we trim off the first 5% and the last 5% of the dataset, the mean of the remaining 18 scores = 89.9
*Tri-Mean = 89.9
Other types of mean
Harmonic mean
*Geometric mean
*Quadratic mean
Define the Median
The median divides a data set into halves
*The Median is therefore the datapoint which lies at the centre of the dataset when arranged in ascending or descending order
*It is the point below which 50% of the data lies, and above which 50% of the data lies
*For this reason, it is also known as the 50th Percentile
Example of median example
Suppose we had the following data set and we wanted to find the median point – {22, 27, 8, 5 13}
*The first thing to do would be to rank the data in ascending order – {5, 8, 13, 22, 27}
*In this manner the median or middle value of 13, becomes obvious: half of the population is to the left of the median, and the other half is to the right.
*The median value is therefore that value which cuts the population into half
*In other words, the position of the median is n+1 / 2 = 3rd value
Difference between mean and the median
The difference in these two examples is that one data set comprised of an odd set of numbers, while the other was even
*With odd-numbered data sets, the median value is simply the middle value after the data has been ranked in ascending order (in the position of n+1/2)
*With even-numbered data sets, however, the median is found by taking the average value of the two middle values
*Therefore, depending on the data set, the median does not necessarily have to be a value of the data set
Define Mode
The Mode is the most frequently occurring datapoint in a dataset
*It can be read off a Stem-and-Leaf Display. The leaf with the highest frequency points us to the Mode.
*In the case of grouped data presented in a frequency table, we can identify the modal class (the class with the highest frequency) and proceed to estimate the Mode by the class mark of that class
The Mode may not necessarily be affected by a change in one datapoint
*It may not be unique
*If it is unique, the dataset is unimodal
*Otherwise the dataset may be bi-modal, multi-modal or even possess no mode at all