measures of spread Flashcards

Question 1

Q

measures of spread

Answer

A

range
interquartile range
deviation
variance - sample and population
standard deviation

Question 2

Q

range

Answer

A

the distance between its smallest and largest values

Question 3

Q

interquartile range

Answer

A

a slightly more useful measure than the range is the interquartile range (IQR)
involves splitting the data into quarters:
find the median to split the data in half
split each of the halves into half again
the IQR is the range covered by the middle 2 quarters (50%) of the data

Question 4

Q

range and IQR

Answer

A

the range and the IQR only tell us limited information
two datasets can have the same range and IQR but still look very different

Question 5

Q

deviation

Answer

A

the range and IQR depend on only two points
to get a more fine-grained idea of the spread we need to take every data-point into account
one way to do this is to take each data-point and calculate how far it is away from some reference point, such as the mean
this is known as the deviation
once we have the deviation values, then what do we do with them?
if we add them up then the sum will just be bigger whenever we have more data
but it’s possible to have bunched up large datasets and spread out small datasets and our measure should be able to account for this.
instead of adding up the deviations we could work out the average of the deviations
but some deviations will negative and some will be positive, so they’ll just average up to 0

Question 6

Q

squared deviations

Answer

A

we can make sure all the deviations are positive by squaring the values
the mean of the squared deviations will be the basis for our next measure of spread, the variance

Question 7

Q

variance

Answer

A

the population variance - the mean of the squared deviations from the population mean
but we dont usually know the value of the population mean, so can we just use the sample mean instead?

Question 8

Q

squared deviations from the population mean

Answer

A

We’ll start off with a population where we know the population mean 100, and the variance of the population (225).
We’ll take samples from this population, and work out the average of the squared deviations from the population mean.
The value we calculate varies from sample to sample, but what does it do on average?
We can repeat what we did with the sample mean and see what happens with the average squared deviations from the population mean.
The running average of the average squared deviations from the population mean.
On average the average of the mean squared deviation from the population mean will be equal to the variance of the population.
Now let’s repeat the process but use the deviation from the sample mean instead
Instead figure 5, we can see the running average of average squared deviations from the sample mean.
Now we can see the problem of using deviation from the sample mean instead of deviation from the population mean.
Our calculated value will on average not be the same as the variance of the population.
So what’s the solution?

Question 9

Q

sample variance

Answer

A

when we only have access to information from the sample (e.g., sample mean) then we have to calculate a quantity known as the sample variance
Dividing by N-1 rather than taking a simple average (dividing by N) means that on average the sample variance will be equal to the variance of the population.

Question 10

Q

sample variance and population variance

Answer

A

If you have access to the entire population (e.g., you can compute the population mean) then you can calculate the population variance (divide by N).
If you can only have access to the sample characteristics (e.g., you can only calculate the sample mean) then you must calculate the sample variance (divide by N-1).
The confusing part is the sample variance is an unbiased estimator of the variance of the population.
This just means that the sample variance will coverage to the variance of the population.
Using the population variance formula with sample values is a biased estimator of the variance of the population.
This just means that it wont coverage to the variance of the population.
Remember, what we really want to know are the features of the population (it’s mean and variance) but we need to estimate these from the sample.

Question 11

Q

standard deviation

Answer

A

the variance is a good measure of spread, and its a commonly used measure, but it can be a little difficult to interpret
For example, think back to the salary example from lecture 6
- If salary is measure in USD
  - Then the variance is measures in USD
fortunately there is a solution, just taje the square root of the variance
this measure is called standard deviation

Question 12

Q

why the squared deviations and not the absolute value?

Answer

A

When we worked out the deviations, we squared them to turn the negative values into positive values.
But could we just take the absolute value?
Below we have two data sets made up of four data points each
The data in A are more spread out than the data in B
So lets calculate the average of the squared deviations and the average of the absolute value of the deviations.
First of the data in A:
- The mean of the absolute deviations is - 70
- The mean of the squared deviations is - 7400
Then the data in B:
- The mean of the absolute deviations is - 70
- The mean of squared deviations is - 4900
So even though the two sets if data have different amounts of spread, the mean of absolute deviations doesn’t pick it up, but the mean of the squared deviations does.

Question 13

Q

the relationship between samples and populations

Answer

A

Now that we have tools for describing the centre/typical value of a set of measurements (mean) and the spread of a set of measurements (variance/standard deviation) we can these two ideas together.
- In lecture 6 we saw that individual sample means were spread out around the population mean.
- We can quantify that spread using the idea of the standard deviation.
But we’re no longer calculating the spread of our sample or even the spread of the population.
We’re now calculating the spread of sample means around the population mean.
This kind of standard deviation has a special name - the standard error of the mean.

Question 14

Q

the standard error of the mean

Answer

A

the standard error of the mean in technical terms is the standard deviation of the sampling distribution of the mean
to fully appreciate the concept of the standard error of the mean we’ll need to understand the concept of the sampling distribution
and to understand the sampling distribution we’ll first need to understand what distributions are, what they look like, and why they look the way they do

measures of spread Flashcards

(14 cards)