Techniques for summarizing data Flashcards
what are numerical measures?
-central tendency:
mean, median, mode
-dispersion:
variance, standard deviation, coefficient of variation, interquartile range
-Relative standing: quantiles
what does variance and standard deviation measure?
measures the average scatter around the mean
- the greater the spread or dispersion of the data, the larger the range, variance and SD
- the smaller the spread or dispersion of the data, the smaller the range, variance and SD
- if values are all the same (no variation in data), range, variation and SD will be 0
cannot be negative
what is the coefficient of variation?
relative measure of variation, that is expressed in terms of a percentage
denoted by symbol :CV
formula for coefficient of variation
CV = (S/X̄) x 100%
S= sample standard deviation
X̄= sample mean
how to interpret coefficient of variation
compare two or more sets
CVcalories = 36.08%
Cvsugar= 57.84%
Relative to the mean, amount of sugar is more variable than calories
If only one data set
CVcalories = 36.08%
standard deviation is 36.08% of mean
Describe Z score
is useful in identifyig outliers. Values located far away from the mean will have very small (negative) Z score
or very large (positive) Z scores
Formula for Z score
Z = (X - X̄) / S
when is a Z score considered an outlier?
if it is less than -3 or greater than +3
Describe skewness
Measures the extent to which a set of data is not symmetric
Left or negative skewed: Mean < median
Symmetric: Mean = median
Right or positive skew : Mean > median
what are quartiles?
quartiles splits data into four parts:
First quartile (25% of values are smaller or equal to Q1 and 75% are larger or equal to) , second quartile, third quartile and fourth quartile (25% of values are larger or equal to Q4 and 75% are smaller or equal to)
Formula for Q1
Q1 = (n+1)/4 ranked value
Formula for Q3
Q3= 3(n +1) /4 ranked value
What is the formula for interquartile range?
Q3 - Q1
what does interquartile range measure?
interquartile range measures the spread in the middle 50% of the data.(not influence by extreme values)
eg. IR= 44-35 = 9
Interquartile range in the time to get ready is 9. The interval 35 and 44 is referred to as the middle 50
what does box plot look like?