week 8- descriptive statistics Flashcards
categorical data
nominal and ordinal
numerical data
interval and ratio
nominal LOM
- named categories
- can calculate mode
- researcher assigns a number to each category
- dichotomous or multiple
ordinal LOM
- rank order categories
- can calculate median, range and conduct non-parametric stats
interval LOM
- zero isn’t meaningful
- can calculate mean, SD and parametric stats
- equal intervals between numbers
ratio LOM
- meaningful zero
- can calculate mean, SD and parametric stats
- highest LOM
ways to describe data
frequency distributions, shapes of distributions, measures of central tendency, measures of variability, percentiles/quartiles, association/correlation
frequency distribution
- basic way of organizing data
- tally frequency of events
- display in tables or graphically (histogram, frequency polygon, bar graphs, box plot)
shapes of distribution
a) normal distribution: bell shaped curve, mean/median/mode are all in same point
b) skew: describes asymmetry
c) kurtosis: describes peak/heaviness of tails
measures of central tendency
a) mean: most frequently used measure
b) median: exact middle score
c) mode: value that occurs most frequently
mean
- mathematical average
- applies to interval or ratio level data
- very sensitive to extreme scores (will move towards tail of skewed distribution)
median
- exact middle
- value above and below which 50% of the scores fall
- not affected by extreme scores, used with skewed distributions
- applies to ordinal, interval and ratio level data
mode
- value that occurs most frequently
- least precise
- primarily describe typical values of categorical data
- bimodal (two modes), multimodal (several modes)
when to use each measure of central tendency?
- mode is best used with nominal data
- median is best used with ordinal and interval/ratio (with extreme scores)
- mean is best used with interval/ratio (no extreme scores)
measures of variability
range, interquartile range, variance, standard deviation
percentile
percentage of cases a given score exceeds, median is 50th percentile
interquartile range
difference between the upper and lower quartiles
range
how wide or different scores are in the data set (largest number - smallest number)
variance
average squared deviation of each number from the mean of a data set
standard deviation
- most commonly used and most important measure of variability
- average of how much each score varies from the mean
- square root of the variance
- higher SD = more spread out, lower SD = less spread out
empirical rule
with normal distribution, 68% of the scores are within 1SD of the mean, 95% are within 2SD of the mean, 99.7% are within 3SD of the mean
standardizing distribution
- standardized distribution is composed of scores that have been transformed to create predetermined values for the mean and SD
- used to make dissimilar distributions comparable
- individual scores are converted to standard scores (z score)
** only for normal distributions