Sample statistics & Dispersion Flashcards
Calculate the average (mean) of a data set
- add all values, then divide by the number of individuals.
It is the “center of mass.”
describe distributions that are fairly
symmetrical and don’t have outliers.
Median - is ____
how to do___
the midpoint of a distribution—the number such that half of the observations are smaller, and half are larger.
1) Sort observations from smallest to largest. n = number of observations
2) The location of the median is (n + 1)/2 in the sorted list.
Mean (x bar) VS Median (M) and skew
median is a measure of center that is resistant to skew and outliers. The mean is not.
symmetric distribution - Mean and median are the same
right-skewed distribution - mean is pulled toward the skew and outliers
Quartiles are the
Measure of spread
1st quartile vs 3d
first quartile, Q1, is the median of the values below the median in the sorted data set.
third quartile, Q3, is the median of the values above the median in the sorted data set.
Standard Deviation - used to describe
used to describe the variation around the mean
Measure of spread - variability of data
How to calculate Standard Deviation
1) Calculate the variance = s2
[Why divide by n-1 (?) because when s2 = 0 when everything collapse to same point]
2) Take the square root to get the standard deviation s
find the mean then minus each data pint by mean and then sum all data points then divide by n-1 then square rout it to compute SD
boxplots symmetry or skew
Interquartile range (IQR)
is the distance between the first and third quartiles (the length of the box in the boxplot)
IQR = Q3 – Q1
How far outside the overall pattern does a value have to fall to be considered a suspected outlier?
Suspected low outlier: any value < Q1 – 1.5 IQR
Suspected high outlier: any value > Q3 + 1.5 IQR