Descriptive Stats Flashcards
What are measures of central tendency
Methods employed to determine central point in a given distribution
Mode
- Corresponds to score that has highest frequency in a frequency distribution (visually its the
highest value) - For grouped distribution (histogram) its defined as the most frequently occurring interval (or
the mid-point of that interval). - It is applicable to almost all kinds of data-sets.
Disadvantages of mode
- Lack of reliability
- Lack of precision in some cases
Unimodal distribution
Distributions with single highest values
Bimodal distribution
Distributions with two highest values
Properties of mean
- If a constant is added (or subtracted) to every score in a distribution, the mean is increased
(or decreased) by that constant. - If every score is multiplied (or divided) by same constant, the mean will be multiplied (or
divided) by the same constant. - The sum of deviations from the mean will be equal to zero.
- The sum of squared deviations from the mean will be less than the sum of squared
deviation around any other point in the distribution.
Median
- Corresponds to determining middle score of a distribution, after arranging the data in ascending order.
- Corresponds to 50th percentile of a distribution.
- If a distribution has odd number of scores then median is literally the middle value in a distribution
(provided the data is arranged in ascending order). - If a distribution has even number of scores then median corresponds to average of two middle scores.
- In principle it divides a distribution into two equal halves.
Disadvantages of median
- Its not applicable to all kinds of data-sets.
(e.g., The median cannot be identified for categorical nominal data, as it cannot be
logically ordered). - Median is more informative if there are not many ties, and the distribution is skewed.
Whathappens if there is a great deal of variability
No measure of central tendency is very representative of the scores, if the
distribution contains a great deal of variability.
Different measures of variability
- Range
- Semi-Interquartile range
- Mean deviation
- Variance
- Standard deviation.
Range
- Evaluates width of a distribution by subtracting lowest (lowest real limits) from highest
score (highest real limits). - The advantage is that it captures whole distribution.
Disadvantages of range
- The major disadvantage of range is that, just like mode, it’s unreliable.
- The range can be changed drastically by removing or adding just one score in the
distribution.
Semi interquartile range
- This type of measure of variability can be used for open-ended distribution.
- The interquartile range is obtained by subtracting the 25th percentile from the 75th
percentile. The semi-interquartile range is half the interquartile range. - It does not get affected much by addition or subtraction of extreme scores from a
distribution.
Mean deviation
- It evaluates distance of every score from the mid point of the distribution and averages it
Deviation score = ?
Mean - Individual score
Mean deviation calculation
- Deviation score
- Mean of al deviation scores (take absolute deviation scores
What is variance also referred to as
Mean square
SS = ?
summation of (individual score - mean)^2
Variance = ?
SS / N
Standard deviation
- calculated by taking the root of teh variance
- also called the root mean square
- affected by scores having large deviations in distribution