Statistics Flashcards
What is a KDE plot
Kernel Density Estimator. Gives the curve of a one dimentional plot. Similar to a histogram, but no binning, which can skew the data
Violin Plots
Violin plots are less familiar and trickier to read, so let’s break down the different parts:
There are two KDE plots that are symmetrical along the center line.
A white dot represents the median.
The thick black line in the center of each violin represents the interquartile range.
The lines that extend from the center are the confidence intervals - just as we saw on the bar plots, a violin plot also displays the 95% confidence interval.
Standard deviation and bootstrapped confidence intervals are two measurements that can be used for:
error bars
What does the box in the center of the violin plot represent?
interquartile range
what is the interquartile range?
The interquartile range is a measure of where the “middle fifty” is in a data set. Where a range is a measure of where the beginning and end are in a set, an interquartile range is a measure of where the bulk of the values lie. That’s why it’s preferred over many other measures of spread (i.e. the average or median) when reporting things like school performance or SAT scores.
The interquartile range formula is the first quartile subtracted from the third quartile:
IQR = Q3 – Q1.
How do you find the interquartile range for an odd number of values?
Step 1: Put the numbers in order.
1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27.
Step 2: Find the median.
1, 2, 5, 6, 7, 9, 12, 15, 18, 19, 27.
Step 3: Place parentheses around the numbers above and below the median.
Not necessary statistically, but it makes Q1 and Q3 easier to spot.
(1, 2, 5, 6, 7), 9, (12, 15, 18, 19, 27).
Step 4: Find Q1 and Q3
Think of Q1 as a median in the lower half of the data and think of Q3 as a median for the upper half of data.
(1, 2, 5, 6, 7), 9, ( 12, 15, 18, 19, 27). Q1 = 5 and Q3 = 18.
Step 5: Subtract Q1 from Q3 to find the interquartile range.
18 – 5 = 13.
How do you find the interquartile range for an even number of values?
Sample question: Find the IQR for the following data set: 3, 5, 7, 8, 9, 11, 15, 16, 20, 21.
Step 1: Put the numbers in order.
3, 5, 7, 8, 9, 11, 15, 16, 20, 21.
Step 2: Make a mark in the center of the data:
3, 5, 7, 8, 9, | 11, 15, 16, 20, 21.
Step 3: Place parentheses around the numbers above and below the mark you made in Step 2–it makes Q1 and Q3 easier to spot.
(3, 5, 7, 8, 9), | (11, 15, 16, 20, 21).
Step 4: Find Q1 and Q3
Q1 is the median (the middle) of the lower half of the data, and Q3 is the median (the middle) of the upper half of the data.
(3, 5, 7, 8, 9), | (11, 15, 16, 20, 21). Q1 = 7 and Q3 = 16.
Step 5: Subtract Q1 from Q3.
16 – 7 = 9.
This is your IQR.