Topic 2 - Numerical Measures Flashcards
What are the 3 measures of central tendency
- Arithemtic Mean
- Median
- Mode
What are the 4 measures of variability
- Range
- Interquartile Range
- Variance
- Standard Deviation
What is a benefit of the median
- Not affected by extreme values
What is the pro and con of the mode
- Pro: Not affected by extreme values
- Con: There may be no mode or several modes
What information do measures of variability provide
- Information on the spread or variability of the data values
What is a disadvantage of the range
- Ignores the way in which data is distributed
What is a benefit of the interquartile range
- Can eliminate some outlier problems
What is the variance
- Average of squared deviations of values from the mean
What is the difference in calculation between population and sample variance and standard deviation
- Population: divide by N
- Sample: divide by n-1
Why do we divide by n-1 for sample variance
- So that the sample variance is an unbiased estimator of the population variance
What is an unbiased estimator
- The average of the sample variances for all possible samples to equal the population variance
How can we infer the standard deviation graphically
- Wide base = Large standard deviation
- Shallow base = Small standard deviation
What is the empirical rule
- If the curve is bell shaped
- mu +- 1s.d contains 68% of values in pop or sample
- mu +- 2s.d contains 95% of values in pop or sample
- mu +- 3.d contains 99.7 values in pop or sample
What is the z-score
- Shows the position of an observation relative to the mean of the distribution
- Indicates the number of s.d a value is from the mean
- z > 0 value greater than mean
- z < 0 value less than mean
- z = 0 equal to mean
How is the z-score calculated
- If the data set is the entire population and the population mean and s.d are known
- z = x - mu / s.d
What are covariance and correlation coefficient
- Covariance: measure of the direction of a linear relationship between two variables
- CorCoe: measure of both direction and strength of linear relationship between two variables
What do we divide by when calculating population and sample covariance
- Population: N
- Sample: n-1
What do different values of covariance mean
- Cov(x,y) > 0 x and y tend to move in the same direction
- Cov(x,y) < 0 x and y tend to move in opposite directions
- Cov(x,y) = 0 x and y are independant
How is the correlation coefficient calculated
Cov(x,y) / Var(x) * Var(y)