Chapter 3 Flashcards
What are the measures of location
- Mean
- Weighted Mean
- Geometric Mean
- Median
- Mode
- Percentiles
- Quartiles
Describe the mean
- most important measure of location
- also called average
- provides a measure of central location
- most commonly used
Describe the weighted mean
- arithmetic average where some data values contribute more than others
Describe the Geometric Mean
- finding the nth root of the product of n vales
- often used to analyze growth rates in financial data
- in these cases, arithmetic mean will provide misleading results
What format must the the number be in in order to calculate the Geometric mean?
- can’t be in a percent must be an integer
- if it is a percent, and it is negative, 1- the number
- if it is a percent and it is positive , 1 + the number
example: -40% converts to -0.40 = 1-.40 = 0.6
then you can use the formula
Describe the median
- sometimes the more preferred method because it remove outliers
- the middle of a sorted list of data values
- arranged in ascending order
- odd # values = median is the middle number
- even # values = median is the mean of 2 central data values
Describe the mode
- data value that occurs most often (greatest frequency)
- 2 modes - bimodal
- more than 2 modes - multimodal
- don’t report the mode b/c listing 3 or more modes is not helpful in describing the location of data
Describe percentiles
- how data is spread over the interval from smallest value to largest
3 steps
1. arrange the data in ascending order
2. compute an index
3. if i is not an integer - round up
if i is an interger, the pth percentile is the avg. of the values in i & i+1
What is the formula for to find the percentile of x?
of data points than x / total # of data points
What are the measures of variability
- range
- interquartile range
- variance
- standard deviation
- coefficient of variation
What is another name for measures of variability
Measures of dispersion
What is the measures of variability
spread of data
Describe Range
- Largest value - smallest value
- seldom used as the only measure
b/c it is based on 2 observations and it is highly influenced by extreme values
Describe Interquartile range
IQR = Q3- Q1
- overcomes the dependency of extreme values
- difference b/w the 3rd and 1st quartile
- it is the range for the middle 50% of the data
Describe Variance
- utilizes ALL data
- based on difference b/w EACH observation (xi) and the mean
- called deviation about the mean
Describe Standard deviation
- positive square root of variance
- easier to interrupt than the variance b/c SD is measured in the same units as the data
- commonly used measure of risk associated with investing in stock and stock funds
Describe the Coefficient of Variation
- when interested in descriptive statistic that indicates how LARGE the SD is relative to the MEAN
- usually expressed as a percent
- tells us that the sample SD x% of the value of the sample mean
- useful for comparing the variability of variables that have different SD and Different means
What is the formula for Coefficient of Variation
(SD/Mean) x 100
What is MAE
Mean absolute error
What is MAE
- sum the absolute values of the deviations of the observations about the mean & divide it by # of observations
What are the measures of distribution of shape
- Skewness
- z-scores
- Chebyshev’s Theorem
- Empirical Rule
What kind of skewness is there and describe each one
- Skewed Left
- skewness is negative
- mean is usually less than the median - Skewed Right
- skewness is positive
- mean is usually more than the median
Symmetrical
- skewness is zero
- mean and median are equal
describe z-scores
- to find relative location of values w/in a data set
- how far a particular value is from the mean
- also called standard value
- using mean and SD we can find the relative location of any observation
Describe Chebyshev’s Theorem
- allows us to make statements about the proportion of data values that must be w/in a specified # of SD of the mean
- at least 75% of the data values are w/in 2 SD of the mean
- at least 89% of the data values are w/in 3 SD of the mean
- at least 94% of the data values are w/in 4 SD of the mean