Measures of Location and Spread Flashcards
What is the median of a dataset?
The median is the number that divides the (ordered) data in half - the smallest number that is at least as big as half the data. At least half the data are equal to or smaller than the median, and at least half the data are equal to or greater than the median
The median minimises the sum of absolute values of differences.
What is the mode of a dataset?
The mode of a set of data (as opposed to the mode of a histogram) is the most common value among the data. It is rare that several data coincide exactly, unless the variable is discrete, or the measurements are reported with low precision
For the mode, distance between two numbers is defined to be zero if the numbers are equal, and one if they are not equal.
What is the mean of a dataset?
The mean (more precisely, the arithmetic mean) is commonly called the average. It is the sum of the data, divided by the number of data
The mean minimizes the sum of squared differences.
What is the range of a dataset?
The range of a list is the largest value minus the smallest value. It is the width of the smallest interval that contains all the data, so it measures spread. It is not resistant, because changing just one datum can make it arbitrarily large
What is the IQR of a dataset? (Interquartile range)
The IQR is the upper quartile (75th percentile), minus the lower quartile (25th percentile). It is the width of the interval that contains the middle 50% of the data - and thus is a measure of spread. It is insensitive to the most extreme value of the data (assuming that there are more than four data). The IQR is resistant: changing just one datum has a limited effect on it. Note that neither the ragne nor the IQR is a range of numbers, despite their names - each is a single number.
What is the SD of a dataset? (Standard deviation)
The standard deviation (SD) of a list is the “typical size” of the difference between elements of the list and the mean of the list, measured by the RMS. The SD measures how spread out the data are around their mean. To find the SD; we first find the mean of the list, then make a list of deviations from the mean, and finally, find the RMS of the list of deviations from the mean (the square-root of the average of the squares of the devitations).
What is the RMS of a dataset? (Root mean square)
The RMS (root mean square) of a list measures the average size of its entries. It is defined as follows:
RMS = ((sum of the squares of the entries)/(number of entries))0.5
What is an affine transformation?
An affine transformation or change of variables is particularly simple. Affine transformations have the equation of a line:
(transformed value of x) = a x (orinial value of x) + b
Where a and b are constants. (Some books call this a linear transformation, because it has the equation of a straight line, however in mathematics strictly speaking, an affine transformation is a linear transformation, plus a constant)
What is Markov’s inequality?
If the mean of a list of numbers is M, and the list contains no negative number then:
[fraction of numbers in the list that are greater than or equal to x] ≤ M/x
Example:
There are 200 students in a class. The average amount of money in their pockets is $15. How many could have $75 or more in their pockets?
Solution. No student can have a negative amount of money in his or her pocket, so Markov’s inequality applies. Markov’s inequality guarantees that
[fraction of students with at least $75 in their pockets] ≤ $15/$75 = 0.2 = 20%.
Thus at most 20% of the students (40 students) could have $75 or more in their pockets.
What is Chebychev’s inequality?
If the mean of a list of numbers is M and the standard deviation of the list is SD, then for every positive number k,
[the fraction of numbers in the first list that are k x SD or further from M] ≤ 1/(k2)
Chebychev’s inequality says that not too many of the numbers in a list can be far from the mean, where far is measured in standard deviations. Conversely, if a large fraction of the values are far from the mean, the SD of the list must be large.
Example:
The mean weight of students in a certain class of students is 140 lbs., and the SD of their weights is 30 lbs. What fraction weighs between 90 lbs. and 190 lbs.?
Solution. We cannot get an exact answer, but we can get a lower bound using Chebychev’s inequality. The range from 90 lbs. to 190 lbs. is the mean, plus or minus 50 lbs. 50 lbs. is 1 2/3 times the SD of the weights, so according to Chebychev’s inequality, the fraction of students who weigh less than 90 lbs. or more than 190 lbs. is at most
1/(1 2/3)2 = 1/(1.66672) = 0.36 = 36%.
Thus the fraction who weigh between 90 lbs. and 190 lbs. is at least 100% − 36% = 64%.