Reading 8: Statistical Concepts and Market Returns Flashcards
Descriptive Statistics
Are used to summarize important characteristics of a large data set
Inferential Statistics
Procedures used to make judgments about a larger data set based on the statistical characteristics of a smaller set (a sample)
Population
A set of all possible members of a stated group e.g. all stocks on NYSE
Sample
A subset of the population of interest
Types of Measurement Scales
Nominal
Ordinal
Interval
Ratio
Nominal Scales
Data put into categories that have no particular order (range with the least amount of information)
Ordinal Scales
Data is put into categories that can be ordered according to some characteristics
Reveals nothing about performance differences
(Higher level of measurement than nominal)
Interval Scale
Temperature
Relative ranking like ordinal with differences in data values being meaningful, however ratios, such as twice as much/large are not meaningful
Measurement of zero does not mean the absence of what we are measuring
Ratio Scale
Most refined level of measurement (Money)
Ratios of values (twice as much etc.) are meaningful, and zero measures the complete absence of the characteristics being measured
Parameter
Numerical measure used to describe a characteristic of a population
E.g. mean or standard deviation of returns
Sample Statistic
Characteristic of a Sample
Frequency Distribution
Groups observations into a classes or intervals. An interval is a range of values
Relative Frequency
The percentage of total observations that fall within each interval
Cumulative Relative Frequency
The sum of all relative frequencies up to and including the given interval
Histogram
Graphical presentation of absolute frequency distribution (Bar Chart)
Benefit: Allows us to see where most observations are concentrated
Frequency Polygon
Midpoint of each interval is plotted on the horizontal axis and the absolute frequency is plotted on the vertical axis (Line Chart)
Measures of Central Tendency
Used to identify the center or average of a data set. Can be used to represent the expected value of a dataset
Population Mean
Sum of all values in a population divided by the total number of observations in the population (only one possible mean)
Sample Mean
Sum of all values in a sample divided by the total number of observations in the sample (used to make inferences about the population)
Arithmetic Means (Properties)
- All interval and ratio data sets have an arithmetic mean
- All data values are considered and included
- Only one mean
- Sum of all deviations of each observation always equals zero
Arithmetic Mean (Negative)
Outliers can have a disproportionate effect
Arithmetic Mean (Positive)
Uses all information available from observations
Weighted Mean
Recognizes that outliers have a disproportionate effect
Used to calculate portfolio returns (weighted average return of the individual assets in the portfolio)
Median
Middle number
Helps eliminate disproportionate effect of outliers
Calculate the arithmetic mean if there is an even number of observations
Mode
Number that occurs most frequently in a dataset.
Unimodal, Bimodal, Trimodal
Geometric Mean
Used to calculate investment returns over multiple periods
Measures compound growth rates
Always less than or equal to the arithmetic mean
Harmonic Mean
Average cost of shares purchased over time
Dollar Cost Averaging
Purchasing the same dollar amount of mutual fund shares each month or each week
Modal Interval
The interval with the greatest frequency
Quantile
A value at or below which a stated portion of the data lies
Quartiles
Distribution of data into quarters
Quntile
Fifths
Decile
Tenths
Percentile
100ths
Quartile (Formula)
Ly = (n+1) * y/100
Where:
n = # of data points
y = given percentile
Measures of Locaiton
Quantiles and Measures of central tendency collectively
Dispersion
Variability around the central tendency (mean etc.)
Range
Max - Min
Mean Absolute Deviation (MAD)
Average distance between each data value and the mean. (Use absolute values e.g. ignore the mean)
[Population] Variance
Average of the squared deviations from the mean
[Population] Square Root
Positive square root of the population variance
NB: Standard Deviation > MAD (in general)
Sample Variance (Difference)
N-1 as a denominator to ensure there is not an unbiased overestimation
Chebyshev’s Inequality
1-1/k2
- the minimum percentage of the population that will lie within k standard deviations from the mean
Coefficient Variation
The ratio of the standard deviation of the sample to its mean e.g. risk per unit of return
Sharpe Ratio
Excess return per unit of risk
(Rp - Rf)/standard deviation
Large positive Sharpe ratios are preferred to smaller ratios (e.g. higher return)
Limitations of the Sharpe Ratio
- Two negative ratios. Higher one doesn’t necessarily imply better returns (e.g. more risk moves it closer to zero)
- Asymmetric Returns e.g. investment strategies with option characteristics (standard deviation not a good measure of risk)
Explain skewness
Refers to the extent to which a distribution is not symmetrical
Positive Skew
Many outliers in the upper region (or right tail) so skewed right and has a longer upper right tail
Negative Skew
Outliers in the lower region (left tail) so skewed left
Where is the mean/median/mode for a positively skewed unimodal distribution?
Mean is greater than median, which is greater than mode
Where is the mean/median/mode for a negatively skewed unimodal distribution?
Mean is less than median which is less than mode
Kurtosis
Measure of peakedness relative to a normal distribution and the probability of extreme outcomes e.g. thickness of tails
Excess Kurtosis
Excess Kurtosis with an absolute value greater than 1 is considered significant
Sample Kurtosis - 3
Leptokurtic
More peaked
Greater probability of being close to the mean or far from the mean
(Riskier investment)
Positive Excess Kurtosis
Platykurtic
Less peaked
Negative Excess Kurtosis Kurtosis
Mesokurtic
Same as peakedness relative to normal
Sample Skewness
Cubed (3) deviations from the mean divided by the cubed (3) standard deviation and by the number of observations
Sample Kurtosis
Measured relative to the kurtosis of a normal distribution
Excess kurtosis values exceeding 1 in absolute values are considered large
Geometric Mean / Arithmetic Mean
Arithmetic Mean = Forecasting single period returns in future periods
Geometric Mean = Forecasting future compound returns over multiple periods