Chapter 4: Numerical Descriptive Techniques Flashcards
Measures of central location
Arithmetic mean (mean, average)
Median
Mode
Arithmetic mean
Aka mean or average
Sum of all observations / total number of observations
Essentially same calculation for sample and population
(Average function in Excel)
Usually first selection of central location but can be sensitive to extreme outlier values
Only functional for interval data
Median
Observations that falls in the middle of a list of observations places in order
If even number of observations then median determined by averaging two middle observations
Same calculation for sample and population
Median function in Excel
Often a better function of center than mean if there are a small number of extreme outlier observations
50% of observations are above and 50% below
Useful for interval AND ordinal data
Mode
The observation (or observations) that occur with the greatest frequency
Sample and population calculated the same way
For larger samples and populations modal class may make more sense than a single mode value
Not great for small samples, potentially not unique
Mode function in Excel
- If multiple Excel returns smallest mode without indicating alternatives
Can be used for any type of data (interval, ordinal, nominal)
Using Excel to calculate multiple statistics
Data Data analysis Descriptive statistics Select input range Summary statistics
Measures of variability
Range
Variance
Standard deviation
Coefficient of variation
Range
= largest observation - smallest observation
No information about observations in between
Variance
Average deviation from the mean squared
- calculate mean
- find the difference (deviation) of each observation from the mean
- square each deviance and sum them together
- divide that by 1 less than the number of observations (this corrects for the mean observation)
- results in variance ^2
Excel: use VAR function
Mostly useful for comparing multiple sets of data
Shortcut method for variance
S^2 = (1/n-1) x (sum of all observations squared - (sum of all observations/number of observations))
Standard deviation
Average deviation from the mean
Square root of the variance
Measure of consistency
Empirical rule for interpreting standard deviation
If histogram of observations is bell shaped (symmetrical and unimodal) then:
- approx 68% of all observations fall within one standard deviation of the mean
- approx 95% of all observations fall within two standard deviations of the mean
- approx 99.7% of all observations fall within three standard deviations of the mean
Chebysheff’s theorem
The proportion of observations in any sample or population that lie within k standard deviations of the mean is:
1 - (1/k^2) for k>1
Provides the lower bound of proportions in an interval
Can be used when the empirical rule does not apply (non bell shaped histograms)
Can be used when empir
Coefficient of variation
The standard deviation of the observations divided by the mean
Indicates if standard deviation is large or small given the observation set
Measures of relative standing
Provide information about the position of particular values relative to the entire data set.
Percentile
Quartiles
(Quintiles, deciles)
Interquartile range
Percentile
The Pth percentile is the value for which P% are less than the value and (100 - P)% are greater than the value
Use to describe a single set of interval or ordinal data to communicate relative standing
Quartiles
Describe the 25th, 50th, and 75th percentiles
25th percentile- first/ lower quartile, Q1
50th percentile - second quartile, Q2 (median)
75th percentile - third/ upper quartile, Q3
Use to describe a single set of interval or ordinal data to communicate relative standing
Excel: use descriptive statistics box
Define kth largest (integer closest to n/4)
Same for kth smallest
To approximate third and first quartiles
Gives some idea of histogram shape
Skewed vs symmetric
Location of a percentile
Location of percentile P = (n + 1) * p/100
n= number of observations
Tells you the distance the of the percentile from the surrounding observations
Interquartile range
= Q3 - Q1
Measures the spread of the middle 50% of observations
Large values = observations far apart = high variability
Use to describe a single set of interval or ordinal data to communicate variability
Measures of linear relationship
Covariance
Coefficient of correlation
Coefficient of determination
Covariance
Covariance of variables x and y = sum of all observations (distance of x from mean of x) * (distance of y from mean of y) / n-1
Covariance is positive number = variables move in the same direction
Negative number: variables move in opposite directions
Large number: strong relationship
Small number: less strong relationship
- hard to judge without additional data
Coefficient of correlation
The covariance divided by the product of the standard deviations of the variables
Sets limits at - and +1 respectively
\+1 = perfect positive relationship -1 = perfect negative relationship 0 = no linear relationship
Must always judge in relation to other variables
Coefficient of determinarion
Square of the coefficient of correlation
Determines the amount of variation in the dependant variable that is explained by the variation of the independent variable
1= 100% 0= no relationship
Excel: trendline, more options, display r+ squared value on chart