Lesson 6 Flashcards
When talking about a quantitative
variable, there are (at least) three
important aspects to discuss:
The center, The variability
/ spread,The shape
When we talk about the center, we’re talking
about an “average” or “typical” value in the
dataset. This could refer to any of the following:
Mean (most common), Median, mode (rare cases)
If we’re talking about a sample mean, we use
𝑥 (called “x bar”)
If we’re talking about a population mean, we
use
µ (the Greek letter mu)
In each case below, would it make more sense to use the mean, median, or mode to
describe the center of the data?
1. Income for residents of Elon, NC
2. Heights of newborn babies
3. Number of siblings
4. Exam scores in a class where all students did well
- Income for residents of Elon, NC
Median since there would likely be outliers and it would not be spread out evenly. - Heights of newborn babies
Mean since the data values would be spread out evenly. - Number of siblings
Mode (also median or mean) - Exam scores in a class where all students did well
Both Mean and Median
The simplest measure of spread / variability is the range. The range is
the biggest value minus the smallest value. In other words,
Range = maximum value – minimum value
An alternative way to measure the variability / spread is with the
Standard Deviation, The standard deviation is like a “typical” distance that a value might be
away from the mean. Let’s think about this with our exam example.
Which will be more resistant to outliers? The range or the standard
deviation?
The standard deviation is more resistant to outliers than the range.
Do you think the measure of spread you chose will be impacted much
by outliers or not really? What is your reasoning?
However, it is probably still impacted by them (you can test this with
the Mike example). It does use all of our data values in the calculation,
but it will be less impacted than the range.
Percentiles are..
Percentiles are considered a measure of location. They tell
you where a data value lies within your data by describing
the percentage of the values that are below a specific
value.
E.g. Ben Wyatt’s SAT score was at the 98th percentile.
Traditionally, percentiles are
Whole numbers,If you are at
the kth percentile, then k% of the sample (or population)
is below you.
Ben Wyatt’s SAT score was at the 98th percentile → 98% of
people had an SAT score below Ben’s score.
To calculate these, we count the number of other
observations below a value, divide by the total number of
observations, then convert to a percentage.
During the 2019-2020 season, Elon’s women’s soccer team
scored 2.1 goals per game. This was the 31st best out of 335
NCAA Division 1 teams. Find their percentile rank.
335 – 31 = 304 teams below Elon.
(304 / 335) * 100 = 90.7
Elon’s goals per game was at the 90th percentile
(Note: we will always round down for percentiles)
Quartiles are a..
specific type of percentile. They split our data into
quarters by representing the 25th, 50th, and 75th percentiles.
* First quartile (or Q1) has 25% of the data below it.
* Second quartile (or Q2) has 50% of the data below it.
* Third quartile (or Q3) had 75% of the data below it.
Question: What is another name for the second quartile?
Median
One other measure of spread/variability that you may
encounter is called the
interquartile range or IQR.
Rather than looking at the entire range (max – min) we look at the range between our
quartiles (Q3 – Q1).
Why won’t the IQR be impacted by outliers like the range was?
Any outliers will be below the first quartile or above the third quartile! This means the
IQR is more resistant to outliers than the range (and the standard deviation).
How do you find the weighted avg
Multiply each data value by its weight
* Add these together (first sum)
* Add your weights together (second sum)
* Divide the first sum by the second sum
As a formula: σ(𝑣𝑎𝑙𝑢𝑒∗𝑤𝑒𝑖𝑔ℎ𝑡)