Exam 3 Lecture 2 Flashcards
Central tendency is ok, but it’s just not enough. What are some other words for variety?
The average doesn’t tell you the whole story. VARIETY is the spice of life! Range, variance, dispersion, variability.
Variety
Variance, dispersion, uncertainty
If everyone is the same, there is nothing to contemplate or predict.
- We’d just know. Oh, you’re human, your HR= 72.
It’s that people vary and we want to figure out what explains their differentness or predict what will happen.
- Oh, your HR = 120. Mine is 52. I wonder why?
- Differences: You’ve got heart disease. I’m an athlete.
- Oh, your HR =120. Mine is 52. I wonder if I can run a mile faster than you?
- Prediction: Yep, I can!
Dispersion, range, and variance can ______________ while mean stays the same.
CHANGE. Range matters!
Town 1:
$105,000 - $6,700,000, median $312,000
Town 2:
$300,000 - $319,000, median $312,000
These towns have the same MEDIAN house price, so what’s different about them?
The amount of VARIABILITY in prices.
Why does this matter?
If you have $325,000, which town will be more likely to have the house of your dreams?
How to measure variation (easy method)
Range= simplest way to measure variation/variability/variance/variety
- The spread from smallest (minimum) to biggest (maximum) number in the dataset.
- Calculated as (max - min)
Range can be used to determine best and worst, fastest and slowest.
Range shows
VARIATION/ VARIABILITY/ VARIANCE/ VARIETY. Shows the difference between the smallest and biggest numbers in the dataset.
Equation for range
Max value - min value
But range as a measure of variability is limited!
A range of 6,000,000 is:
- small if you are talking about national debt, which is in the trillions
- reasonable if you are talking about house prices in the Hollywood Hills
- enormous if you are talking about red blood cell count (should be 4-5 million)
Range is not grounded by the scale of the variable.
Range is limited because
It is simple to calculate but it doesn’t take into account the scale of the data.
Imagine two datasets:
Dataset 1: Values range from 10 to 20 (range of 10)
Dataset 2: Values range from 100 to 110 (range of 10)
In both cases, the range is 10. However, the data in set 2 is spread out over a much larger scale than set 1, so the range can be a limited scale of variability.
The better way to measure variation is
Standard Deviation
What is standard deviation?
A ‘standard’ tells you that the variation is being measured RELATIVE TO something else»_space;»> the mean
- The standard deviation is the average amount that the numbers in a dataset differ from the average.
Standard Deviation is ________. It’s the _________ from the mean.
Standardized. It’s the deviation from the mean.
What standard deviation formula should you use in excel?
Stdev.s
If your data follow a ___________, then your data have a ‘normal distribution’. If your data have a normal distribution, then you can use __________.
If your data follow a bell curve, then your data have a ‘normal distribution’. If your data have a normal distribution, then you can use common statistical analysis tools.
Bell Curve
Most values in the data set are close to the middle/central tendency.
The further you get from the central tendency, the fewer the data points.
Normal Distributions
Any situation where most people are in the middle, some are better, some are worse. Central tendencies peak= common values.
The data values in a dataset follow a pattern…
- The distribution of data values is as expected
- No weird numbers
- No clusters of numbers that are NOT near the central tendency
This is a naturally occurring distribution in many situations
- IQ, SAT, GPA, BP, height
BUT some data are NOT NORMAL
Tricks to know: Is it normal?
When the distribution is normal, the following are true:
- Mean = median = mode
- It is symmetrical
- Half of data are on the left; half the data are on right
What is the peak and what are tails?
The peak represents the common values while the tails represent the rare values.
Normal Distributions
The shape of the curve allows us to predict how people will score.
The distribution is proportional
- 68% will score within 1 standard deviation from the mean
(34% will be a little better than average, 34% will be a little worse than average. The slope isn’t that steep).
- 95% will score within 2 standard deviations from the mean (27% more)
- Slope is really steep!
-99.7% will score within 3 standard deviations from the mean (4.7% more)
- Slope starts to flatten out
- These #s are rare
__________ is always center, but ________ is not necessarily the center.
Median is always center, but mean is not necessarily the center.