Statistics Flashcards
Mode
- The most frequently occurring value, the peak of the frequency distribution.
- Mode can be used at the nominal, ordinal, and ratio/interval level.
- Most useful in a bimodal distribution. Why? good measure of central tendency, mean/median would appear in the middle of the distribution.
Median
-The middle case of data if scores are ranked from low to high. For this reason, median CANNOT be used at the nominal level as ranking of the scores are not meaningful here. Median is used at the ordinal and interval/ratio level.
-calculates percentile: i.e. 50th percentile- score such that 50% of cases have scores less than or equal to it.
25th percentile- 25% of cases have scores less than or equal to it.
75 percentile- 75% of cases have scores less than or equal to it.
Mean
- Add up the sum of your scores and divide by the # of cases to calculate the average. Mean is the balancing point value of the distribution that is strongly affected by outlying observations.
- CANNOT be calculated with ordinal data as it is too sensitive, only interval and ratio data (requirement for mean and standard deviation)
- Terrible measurement of central tendency when you have outlying observations, will change the mean dramatically (the sum of your scores are not representative) and affects the ability to generalize your sample to the population. The median, in comparison, is a good measurement as it will not change despite the outlying observations.
range
(minimum, maximum)
-the difference between the minimum and maximum values.
IQR (interquartile range)
(25th percentile, 75th percentile)
IQR is the 50% between the 25th and 75th percentile scores.
(17, 34): 50% of scores can be found between $17,000 and $34,000.
standard deviation
- how far away you are from the mean (the average, sum of your scores divided by the number of cases)
- sum of SD’s, average them= variance
- square root of variance= Standard Deviation.
What is the minimum value SD can take on? Zero.
Why? Because to calculate the SD you must average square roots. Cannot take the square root of a negative number.
Chebyshev’s Theorm
Identify: for any univariate distribution, most of the cases will be found between (mean - 2SD, mean +2SD)
ex: mean =55, SD= 5 (55-10, 55+10), most of the cases will be found between (45, 65)
significance: The theorem guarantees that, in all probability distributions, nearly all the cases fall close to the mean. Thus, a researcher could use Chebyshev’s theorem to identify inconsistencies in a set of values.
Pearson’s R
-a Bivariate statistic that measures the linear correlation between two variables, x & y. The correlation coefficient will be a number between -1 and +1.
0= no correlation, random blob of dots.
+1= positive correlation, slope of +1
-1= negative correlation, slope of -1
-used to describe interval and ratio level data. (the distance between the variables is relevant and must be equal.)