Lecture 2 (DESCRIPTIVE STATISTICS II) Flashcards
MEASURES OF CENTRAL TENDENCY
Yield information about “particular places places or locations in a group of numbers”.
MODE
The most frequently occurring value in a data set.
Applicable to all levels of data measurement (nominal, ordinal, interval, and ratio)
Can be used to determine what categories occur most frequently.
BIMODAL : In a tie for the most frequently occurring value, two modes are listed.
MULTIMODAL: Data sets that contain more than two modes.
MEDIAN
Middle value in a ordered array of
numbers.
For an array with an odd number of terms, the median is the middle number.
For an array with an even number of terms the median is the average of the middle two numbers.
ARITHMETIC MEAN
Mean is the average of a group of numbers.
Applicable for interval and ratio data.
Not applicable for nominal or ordinal data.
Affected by each value in the data set, including extreme values.
Computed by summing all values in the data set and dividing the sum by the number of values in the data set.
Population mean
μ
Sample mean
x bar
PERCENTILES
Measures of central tendency that divide a group of data into 100 parts.
At least n% of the data lie below the nth percentile, and at most (100-n)% of the data lie above the nth percentile.
How to calculate percentiles
Organise data into ascending ordered array.
Calculate the percentile location i= (P/100)*n
Determine the percentile’s location and its value.
If i is a whole number, the percentile is the average of the values at the i and (i+1) positions.
If i is not a whole number, the percentile is at the (i+1) position in the ordered array.
QUARTILES
Measure of central tendency that divide a group of data into four subgroups.
Q1: 25% of the data fall below the first quartile.
Q2: 50% of the data set is below the second quartile
Q3: 75% of the data set is below the third quartile.
MEASURES OF VARIABILITY
Tools that describe the spread or the dispersion of a set of data.
RANGE
The difference between the largest and the smallest values in a set of data.
ADVANTAGE: Easy to compute
DISADVANTAGE: is affected by extreme values
INTERQUARTILE RANGE
Range of values between the first and third quartiles.
Range of the middle half; middle 50%
Useful when researchers are interested in the middle 50% and not the extremes.
Used in the construction of box plots and whisker plots
Q3 - Q1
Mean Absolute Deviation, variance, and Standard Deviation
These data are not meaningful unless the data are at least interval level data.
One way for researchers to look at the spread of the data is to subtract the mean from each data set.
Subtracting the mean from each data value gives the deviation from the mean (X - μ)
An examination of deviation from the mean can reveal information about the variability of data.
The sum of deviation from the arithmetic mean is always zero.
ABSOLUTE DEVIATION
An obvious way to force the sum of deviations to have a non zero total is to take the absolute value of each deviation around the mean.
Allows on to solve for the Mean Absolute Deviation
MEAN ABSOLUTE DEVIATION
Average of the absolute deviations from the mean.
(ΣN[X-μ])/N