Lecture 4-Measures of Central Tendency and Dispersion Flashcards

Question 1

Q

Where is the dataset centred?

Answer

A

Measures of central tendency

Question 2

Q

How dispersed is the dataset about its centre?

Answer

A

Measures of central dispersion

Question 3

Q

A measure of central tendency for a dataset

Answer

A

a number that indicates the ‘centre’ of the distribution of the dataset.

Question 4

Q

Three main measures of central tendency

Answer

A

1.Mean 2.Mode 3.Median and TriMean (not main)

Question 5

Q

The Mean

Answer

A

Ideally, we should be able to summarize all the facts about a group of data in one figure
*In fact, this is frequently done by a measure that is at the core of a data set - the average.
*But what is an average?
*In all cases, what we are really aiming at is some notion of a typical value. You may be already familiar with the most popular of average measures - the arithmetic mean.
For grouped data, we compute the mean from the FREQUENCY TABLE /DISTRIBUTION
*Assume that all the datapoints that fall in a class are centred at the class mark
*For each class, find the product of the class mark and the class frequency
*The Mean is found by adding these products and dividing the sum by the total frequency

Question 6

Q

The Mean example

Answer

A

Let us denote the amount of runs that the batsman makes in any one innings as X. Better yet, let us define as X1 the amount he makes in the first innings, X2 what he makes in the second, X5 what he makes in the fifth and so on.
*Generically, we may let Xi stand for the amount he scores in the ith innings, where i can be made equal to 1, or 2, or 5 or n!
–For three innings, the arithmetic mean of his scores would be:
(X1 + X2 + X3) / 3
–For five innings it would be:
(X1 + X2 + X3 + X4 + X5) / 5

Question 7

Q

The mean from grouped data

Answer

A

Freq f
10-14
2
15 – 19
12
20- 24
23
25-29
60
30-34
77
35-39
38
40 - 44
8
220
8
*Consider how to find a mean value from a grouped data set
*We have 220 datapoints . To find the mean, we want to add them up and divide by 220 to find the mean
*HOWEVER, individual values are lost in a frequency distribution!
*What to do?

Question 8

Q

Mean pt 2

Answer

A

Assume that all the datapoints that fall in a class are centred at the class mark
*So, we know 2 datapoints fell within the 10-14 class, but that is all
*We find the mid-point of the class 10+14 / 2 = 12
*We assume the values of those 2 datapoints to be 12
*In reality, they may not be this at all! But this is the price we pay with grouped data
So, the first two datapoints are 12
*To begin calculating a mean, we would start with 12+12+…
*In other words, we multiply the frequency of each class by the mid-point
*The Mean = 6560/220 = 29.8
*Note that, if we were able to use the actual datapoints, the value we derive for the Mean may well be different

Question 9

Q

Define Tri-Mean

Answer

A

If the trimmed mean does not differ considerably from the mean, then we know that the extreme values of the data did not significantly bias the mean calculation
*If they differ, however, then we know that our data set was characterized by untypical, extreme values
*If left unrecognised, this fact could lead us to draw to some erroneous conclusions by relying on the Mean alone

Question 10

Q

Tri-mean examples

Answer

A

Scores of a student over 20 courses arranged in ascending
order as follows:
0, 89, 89, 89, 89, 89, 89, 90, 90, 90,
90, 90, 90, 90, 91, 91, 91, 91, 91, 91
*Mean of all 20 scores = 85.5 (not a typical score)
Quick inspection of the data reveals:
* Scores typically between 89 and 91
*‘0’ is not typical; it is an outlier
*If we trim off the first 5% and the last 5% of the dataset, the mean of the remaining 18 scores = 89.9
*Tri-Mean = 89.9

Question 11

Q

Other types of mean

Answer

A

Harmonic mean
*Geometric mean
*Quadratic mean

Question 12

Q

Define the Median

Answer

A

The median divides a data set into halves
*The Median is therefore the datapoint which lies at the centre of the dataset when arranged in ascending or descending order
*It is the point below which 50% of the data lies, and above which 50% of the data lies
*For this reason, it is also known as the 50th Percentile

Question 13

Q

Example of median example

Answer

A

Suppose we had the following data set and we wanted to find the median point – {22, 27, 8, 5 13}
*The first thing to do would be to rank the data in ascending order – {5, 8, 13, 22, 27}
*In this manner the median or middle value of 13, becomes obvious: half of the population is to the left of the median, and the other half is to the right.
*The median value is therefore that value which cuts the population into half
*In other words, the position of the median is n+1 / 2 = 3rd value

Question 14

Q

Difference between mean and the median

Answer

A

The difference in these two examples is that one data set comprised of an odd set of numbers, while the other was even
*With odd-numbered data sets, the median value is simply the middle value after the data has been ranked in ascending order (in the position of n+1/2)
*With even-numbered data sets, however, the median is found by taking the average value of the two middle values
*Therefore, depending on the data set, the median does not necessarily have to be a value of the data set

Question 15

Q

Define Mode

Answer

A

The Mode is the most frequently occurring datapoint in a dataset
*It can be read off a Stem-and-Leaf Display. The leaf with the highest frequency points us to the Mode.
*In the case of grouped data presented in a frequency table, we can identify the modal class (the class with the highest frequency) and proceed to estimate the Mode by the class mark of that class
The Mode may not necessarily be affected by a change in one datapoint
*It may not be unique
*If it is unique, the dataset is unimodal
*Otherwise the dataset may be bi-modal, multi-modal or even possess no mode at all

Question 16

Q

key points to remember

Answer

A

– select the Mode
*If we seek a measure in the context of an ‘average value’ – select the Mean
*If we seek a measure in the context of a ‘middle value’ – select a Median
*If Mean, Mode & Median agree, the dataset is said to be symmetric.
*Otherwise it is said to be skewed

Question 17

Q

Symmetry vs Skewness

Answer

A

Relationship between mean, median and mode Symmetric Mean = Median = Mode
Positively skewed
Mode < Median < Mean
Negatively skewed
Mean < Median < Mode

Question 18

Q

Skewness

Answer

A

You must be able to distinguish between a Positive Skew and a Negative Skew.
*You must be able to estimate the Degree of Skewness of a dataset by using either Pearson’s First or Second coefficients:
*(mean – mode)/standard deviation.
*3(mean – median)/standard deviation.

Question 19

Q

Intro to Measures of Dispersion

Answer

A

Consider two hypothetical countries, country A and country B, in which only three individuals live.
*Clearly, if we were to rely solely on the mean measure as an index of well-being in both countries, we would conclude that it is fundamentally similar.
*However, it is clear that there are fundamental differences between these 2 countries

Question 20

Q

The Range

Answer

A

Perhaps the most rudimentary measure of dispersion or variability is the range.
*It is calculated as the difference between the highest and the lowest recorded values in a data set. For country A, the range is calculated as $1,000 while for country B it is established as $4,000. The greater level of dispersion in country B tells us, among other things, that the mean value of $4,000 is less reliable as an index of well-being in country B than it is in country A
The Range = Largest Value - Smallest Value
*It is easy to compute
*It depends on only two data points
*It responds to any change(s) in these two data points .Shortcomings –Two significantly different datasets can have the same range
–Any error in one of the two extreme datapoints will bias the range

Question 21

Q

The Inter-Quartile Range

Answer

A

The usefulness of the range as a measure of dispersion is limited by the fact that its value can be artificially affected by the presence of a few extreme outliers. *A possible solution to this dilemma would be to calculate something like a “trimmed” range, similar in spirit to the trimmed mean we met above. *A frequently employed adaptation of such a procedure is to eliminate the lowest and highest 25% of the values and to consider only the range of the remaining values. The measure so obtained is called the interquartile range (IQR).

Question 22

Q

More on Deviations from the Mean

Answer

A

Point by point, the deviations for country B are larger than for country A, confirming once again the greater inequality of income in country B.
*As the data set becomes larger, it becomes more difficult to make point by point comparisons, especially if there is no point by point superiority of one set over another.

Question 27

Q

Mean Absolute Deviation (MAD)

Answer

A

The deviations from the mean are also measures of dispersion or spread. It is tempting to get a summary measure by adding them up and dividing by 3 to get an average value of this dispersion.
*For Country A: (-1) + (0) + (1) / 3 = 0
*For Country B: (-3) + (0) + (3) / 3 = 0
*The deviations sum to zero. This is in fact, a standard result of summing deviations from the mean.
A solution in this case would be to ignore the sign attached to the deviations and calculate the mean of the sum of the absolute values of the deviations from the mean.
*For Country A: (1) + (0) + (1) / 3 = 2/3
*For Country B: (3) + (0) + (3) / 3 = 9
*These figures are examples of a measure of dispersion known as the Mean Absolute Deviation (MAD).

Question 28

Q

Lastly on Mean Absolute Deviation

Answer

A

Yet another approach is to focus on the deviations from the mean
*Given a datapoint x, its deviation from the mean is given by (Actual Value x – Mean Value of the dataset)
*Some of these deviations will be positive; others negative; some even zero
*Any attempt to find a mean deviation from the mean would always result in a zero answer
*Instead we drop the sign on each deviation thereby creating absolute deviations and proceed to find the mean of these absolute deviation; this is called the Mean Absolute Deviation (MAD)
*However, this is mathematically clumsy

Question 29

Q

The Variance

Answer

A

The reason for taking absolute values was to lose the negative sign from the deviations from the mean (or else the deviations would sum to 0)
*However, there is another way to deal with negative values *We speak of the Variance, and its associated measure, the Standard Deviation
*Instead of the absolute value of the deviation from the mean, the variance uses squared deviations.
Devia Squared Dev Devia Squared Deviations
-1 1 -3 9
0 0 0 0
1 1 3 9

Variance, Country A = (1+0+1) / 3 = 2/3
*Variance, Country B = (9+0+9) / 3 = 3

Question 30

Q

Facts about the Variance

Answer

A

Variance is never negative
*It eliminates the clumsy absolute deviation that we encountered in the MAD
*It attaches a greater penalty to greater deviations from the mean (i.e. The equivalent of the square of the value)
*The greater the dispersion the greater the variance
*Unfortunately it is not expressed in the same unit of measure as the datapoints

Question 31

Q

Standard Deviation

Answer

A

Getting around the shortcoming of Variance requires that we find its square root
*The square root will have the same unit of measure as the data in the dataset.
*The square root of variance is called standard deviation.
*SD of Country A = √ (2/3) = 0.8165
*SD of Country B = √ (6) = 2.449

Question 32

Q

Example of Squared Deviation

Answer

A

Country A’s income is distributed with a mean of $4 and a standard deviation of $0.8165
*Country B’s is distributed with a mean of $4 and a standard deviation of $2.449
*Looking at the Means alone tell us that the data sets are similar (and we know they are not)
*Looking at the Means together with, say, Standard Deviations, tell us that the data sets are in fact very different.
*The wider dispersion in country B is an indication that the mean is not as reliable, say, as a measure of economic well being as it is for country A.

Question 33

Q

The Variance and SD with Grouped Data

Answer

A

For grouped data the variance is found by a modified approach:
–Assume again that all data points in a class are centred at the class mark
–For each class compute the square of the deviation of the class mark from the mean
–For each class find the product of the squared deviation and the class frequency
–Sum these products over all classes in the frequency table
–Divide the total by N i.e. the sum of the frequencies
–Variance = * Σfi (xi – Mean) 2]/ N
–Standard Deviation = Positive Square Root of the Variance

Question 34

Q

Standard Deviation chart

Answer

A

Mid x Freq f xf (x-µ) f(X-µ)2
10-14 12 2 24 316.84 633.68
15 – 19 17 12 204 163.84 1966.08
20- 24 22 23 506 60.84 1399.83
25-29 27 60 1620 7.84 470.4
30-34 32 77 2464 4.84 372.68
35-39 37 38 1406 51.84 1969.92
40 - 44 42 8 336 148.84 1190.72

standard deviation =
8002.8 over = square root of 36.38 = 6.03
220

Question 35

Q

Summary

Answer

A

Where is the dataset centred?
–This is answered by way of measures of central tendency
–Mean, Tri-Mean, Median, Mode
*How dispersed is the dataset about its centre?
–This is answered by way of measures of dispersion
–Range, IQR, QD, MAD, Variance, SD

Brainscape's Knowledge GenomeTM

Lecture 4-Measures of Central Tendency and Dispersion Flashcards

Brainscape's Knowledge Genome^TM