Statistics Flashcards
three methods of graphing grouped quantitative data
- Histogram - classes are marked on the horizontal axis, frequencies/relative frequencies/percentages on the y
- polygon - graph formed by joining the midpoints of the tops of successive bars in a histogram with straight lines
- frequency distribution curve - where the frequency polygon eventually becomes a smooth curve
shapes of histograms
symmetric
skewed
uniform/rectangular
Skewed right
tail is longer on the right side
mean will be greater than median
skewed left
tail is longer on the left side
mean will be less than the median
cumulative relative frequency
cumulative frequency / total observations in the dataset
stem and leaf for the number 46
4|6
5|2 which is the stem and which is the leaf
stem is 5
leaf is 2
major shortcoming of the mean as a measure of central tendency is that
it is very sensitive to outliers
median
value of the middle term of the data set ranked in increasing order
it is not influenced by outliers
mode will be greater than / less than mean for a left skew
mode will be greater than the median and the mean
order for mean, median, mode for left skew
mean, median, mode
order of mean, median mode for right skew
mode, median, mean
variance
for a population = σ2
for a sample = s2
calculated as: the sum of all your (values - the mean)squared / sample size
standard deviation
positive square root of the variance
provides a measure of dispersion of ABSOLUTE variability, not of relative variability
when we want to compare variablity of two different data sets, we have to use
the coefficient of variation
coefficient of variation
expresses the standard deviation as a percentage of the mean
CV = (the standard deviation / the mean) x 100%
mean of a population/sample is calculated as
the sum of the midpoint multiplied by the frequency divided by the population number/sample number
Chebyshev’s theorem for standard deviation
for any number (k) greater than 1, at least ( 1 - 1/k^2 ) of the data values lie within k standard deviations of the mean. K can be 1 but it cannot be less than 1.
Empirical rule for Chebyshev’s theorem
68% of the observations lie within one std dev of the mean
95% of the obvs lie within 2 std dev of the mean
99.7% of the obvs lie within 3 std dev of the mean
quartiles
three summary measures (Q1 Q2 Q3) that divide a ranked data set into four equal parts
first quartile definition
the midpoint between the median and the minimum
third quartile
the midpoint between the median and max
how do you find the median
rank all data points in order, pick the middle one
location of Q2/median = (n + 1)/2
if there is an even number of data points, you add up the two values on either side of the mean location and divide it by two (see example on page 6 of 5.3)
percentile definition
the summary measures that divide a ranked data set into 100 equal parts. each data set has 99 percentiles.
the Kth percentile = a value in a data set such that about K% of the measurements are smaller than the value of Pk and about (100-k)% of the measurements are greater than the value of Pk