Describing Data Flashcards
sample mean (3)
- sum of all observations in a sample divided by n, the number of observations
- varies, depending on the composition of the sample
- symbol: Y bar
standard deviation (4)
- common measure of the spread of a distribution and indicates how far the different measurements typically are from the mean
- 67% of the data lies within Y bar +/- s and 95% of data lies within Y bar +/- 2s
- the square root of the variance
- symbol: s
variance (2)
- a measure of spread of data from the mean
- symbol (sample): s^2
- symbol (population): σ (sigma)
deviation (2)
- the difference between a measurement and the mean
- formula: Yi - Y bar
sum of squares
- summation in the numerator of the variance formula
what is the general rule for rounding answers?
- round descriptive statistics to one decimal place more than the measurements themselves
coefficient of variation (4)
- standard deviation expressed as a percentage of the mean
- CV = (s/Y bar) x 100%
- higher CV means more variability and lower CV means individuals are more similar relative to the mean
- only applicable if all measurements are > or = to 0
how do you calculate mean from a frequency table? (3)
- calculate the sample size by adding all the frequencies
- multiply each value by their frequency before adding them together
- divide #2 by #1
how do you calculate standard deviation from a frequency table?
- when subtracting the mean from each value, multiply it by its frequency
median (2)
- middle measurement of a set of observations
- median = Y([n+1]/2)
interquartile range (2)
- difference between the third and first quartiles of the data and spans the middle 50% of the data
- IQR = third quartile - first quartile
first quartile
- middle value of the measurements lying below the median
second quartile
- median
third quartile
- middle value of the measurements larger than the median
box plot (4)
- displays median and interquartile range
- lower and upper edges of the box are the first and third quartiles, thus the IQR is visualized by the span of the box
- lines extend vertically from the box to represent the “non-extreme” values in the data
- “extreme” values are represented as isolated dots
“extreme” values
- values lying farther from the box edge than 1.5x the IQR
mean vs median (2)
- median is the middle value of the data, less sensitive to extreme observations and is a more informative descriptor of the typical observation in those cases
- mean is the centre of gravity or balance, greatly affected by skewed distributions but better mathematical properties and is the more reliable measure to estimate
SD vs IQR (2)
- SD reflects the variation among all of the data points, but is very sensitive to extreme observations
- IQR is a better indicator of spread of the main part of the distribution when data is strongly skewed to one side or there are extreme observations
percentile of a measurement
- specifies the % of observations less than or equal to it, while other observations exceed it
quantile of a measurement
- specifies the fraction of observations less than or equal to it
cumulative relative frequency
- at a given measurement, is the fraction of observations less than or equal to that measurement
what are the 2 common descriptions of data
- location (or central tendency)
2. width (spread)
what are the measures of location? (3)
- mean
- mode
- median
population mean
- symbol: μ
- fixed value that we try to predict using the sample mean
what are the measures of width? (4)
- range
- variance
- standard deviation
- coefficient of variation
range (3)
- maximum minus the minimum
- poor measure of distribution width as small samples tend to give lower estimates of the range
- biased estimator of the the true range of the population (tends to be lower)
skew
- measurement of asymmetry and refers to the pointy tail of skewed distributions
adding/multiplying mean by constant, c
adding: Y + c
multiplying: Y * c
adding/multiplying SD by constant, c
adding: s
multiplying: |c| * s
adding/multiplying variance by constant, c
adding: s^2
multiplying: s^2 * c^2
adding/multiplying median by constant, c
adding: M + c
multiplying: M * c
adding/multiplying IQR by constant, c
adding: IQR
multiplying: |c| * IQR