measures of spread Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

measures of spread

A
  • range
  • interquartile range
  • deviation
  • variance - sample and population
  • standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

range

A
  • the distance between its smallest and largest values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

interquartile range

A
  • a slightly more useful measure than the range is the interquartile range (IQR)
  • involves splitting the data into quarters:
  • find the median to split the data in half
  • split each of the halves into half again
  • the IQR is the range covered by the middle 2 quarters (50%) of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

range and IQR

A
  • the range and the IQR only tell us limited information
  • two datasets can have the same range and IQR but still look very different
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

deviation

A
  • the range and IQR depend on only two points
  • to get a more fine-grained idea of the spread we need to take every data-point into account
  • one way to do this is to take each data-point and calculate how far it is away from some reference point, such as the mean
  • this is known as the deviation
  • once we have the deviation values, then what do we do with them?
  • if we add them up then the sum will just be bigger whenever we have more data
  • but it’s possible to have bunched up large datasets and spread out small datasets and our measure should be able to account for this.
  • instead of adding up the deviations we could work out the average of the deviations
  • but some deviations will negative and some will be positive, so they’ll just average up to 0
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

squared deviations

A
  • we can make sure all the deviations are positive by squaring the values
  • the mean of the squared deviations will be the basis for our next measure of spread, the variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

variance

A
  • the population variance - the mean of the squared deviations from the population mean
  • but we dont usually know the value of the population mean, so can we just use the sample mean instead?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

squared deviations from the population mean

A
  • We’ll start off with a population where we know the population mean 100, and the variance of the population (225).
  • We’ll take samples from this population, and work out the average of the squared deviations from the population mean.
  • The value we calculate varies from sample to sample, but what does it do on average?
  • We can repeat what we did with the sample mean and see what happens with the average squared deviations from the population mean.
  • The running average of the average squared deviations from the population mean.
  • On average the average of the mean squared deviation from the population mean will be equal to the variance of the population.
  • Now let’s repeat the process but use the deviation from the sample mean instead
  • Instead figure 5, we can see the running average of average squared deviations from the sample mean.
  • Now we can see the problem of using deviation from the sample mean instead of deviation from the population mean.
  • Our calculated value will on average not be the same as the variance of the population.
  • So what’s the solution?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

sample variance

A
  • when we only have access to information from the sample (e.g., sample mean) then we have to calculate a quantity known as the sample variance
  • Dividing by N-1 rather than taking a simple average (dividing by N) means that on average the sample variance will be equal to the variance of the population.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

sample variance and population variance

A
  • If you have access to the entire population (e.g., you can compute the population mean) then you can calculate the population variance (divide by N).
  • If you can only have access to the sample characteristics (e.g., you can only calculate the sample mean) then you must calculate the sample variance (divide by N-1).
  • The confusing part is the sample variance is an unbiased estimator of the variance of the population.
  • This just means that the sample variance will coverage to the variance of the population.
  • Using the population variance formula with sample values is a biased estimator of the variance of the population.
  • This just means that it wont coverage to the variance of the population.
  • Remember, what we really want to know are the features of the population (it’s mean and variance) but we need to estimate these from the sample.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

standard deviation

A
  • the variance is a good measure of spread, and its a commonly used measure, but it can be a little difficult to interpret
  • For example, think back to the salary example from lecture 6
    • If salary is measure in USD
      • Then the variance is measures in USD
  • fortunately there is a solution, just taje the square root of the variance
  • this measure is called standard deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

why the squared deviations and not the absolute value?

A
  • When we worked out the deviations, we squared them to turn the negative values into positive values.
  • But could we just take the absolute value?
  • Below we have two data sets made up of four data points each
  • The data in A are more spread out than the data in B
  • So lets calculate the average of the squared deviations and the average of the absolute value of the deviations.
  • First of the data in A:
    • The mean of the absolute deviations is - 70
    • The mean of the squared deviations is - 7400
  • Then the data in B:
    • The mean of the absolute deviations is - 70
    • The mean of squared deviations is - 4900
  • So even though the two sets if data have different amounts of spread, the mean of absolute deviations doesn’t pick it up, but the mean of the squared deviations does.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

the relationship between samples and populations

A
  • Now that we have tools for describing the centre/typical value of a set of measurements (mean) and the spread of a set of measurements (variance/standard deviation) we can these two ideas together.
    • In lecture 6 we saw that individual sample means were spread out around the population mean.
    • We can quantify that spread using the idea of the standard deviation.
  • But we’re no longer calculating the spread of our sample or even the spread of the population.
  • We’re now calculating the spread of sample means around the population mean.
  • This kind of standard deviation has a special name - the standard error of the mean.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

the standard error of the mean

A
  • the standard error of the mean in technical terms is the standard deviation of the sampling distribution of the mean
  • to fully appreciate the concept of the standard error of the mean we’ll need to understand the concept of the sampling distribution
  • and to understand the sampling distribution we’ll first need to understand what distributions are, what they look like, and why they look the way they do
How well did you know this?
1
Not at all
2
3
4
5
Perfectly