Reading 7 Statistical Concepts and Market Returns Flashcards
Explain what kurtosis measures, and differentiate between a leptokurtic distribution, a platykurtic distribution, and a mesokurtic distribution.
Kurtosis measures a distribution peak compared to the normal distribution kurtosis of 3.
Leptokurtic: more peaked, fatter tails, kurtosis greater than three (excess kurtosis > 0).
Platykurtic: less peaked, thinner tails, kurtosis less than 3 (excess kurtosis < 0).
A mesokurtic (normal) distribution has kurtosis of 3 (excess kurtosis of 0).
Give the formula used to compare the relative dispersion of returns on the two investments using the coefficient of variation.
CV is the ratio of standard deviation of the data set to its mean
CV = s / X
¯¯¯
where:
s = sample standard deviation
¯¯¯
X = the sample mean
Note: CV measures the risk per unit of return.
Explain Chebyshev’s inequality, and describe the advantage of using it.
A method of approximating the minimum proportion of observations that lie within k standard deviations from the mean of a data set.
The advantage of using Chebyshev’s inequality is that it holds for samples and populations as well as for discrete and continuous data, regardless of the shape of the distribution.
Describe when arithmetic mean and geometric mean should be used.
Arithmetic mean – single period; arithmetic mean is the average of one-period returns.
Geometric mean – multiple periods; geometric mean links investment performance over time.
How is the weighted mean calculated?
Where an arithmetic mean assigns equal weight to each observation in the data set, a weighted mean assigns different weights to different observations. (Example: the effect of asset class weight on portfolio mean return.)
List the properties of the arithmetic mean, and identify a potential problem with it.
All observations are used in the computation of the arithmetic mean, and a data set has only one arithmetic mean.
All interval and ratio data sets have an arithmetic mean.
The sum of the deviations from the arithmetic mean is always 0.
A potential problem with the arithmetic mean occurs with extreme high or low values, which can disproportionately alter it.
Note: The arithmetic mean is the most frequently used measure of central tendency.
Give the formula used to calculate sample skewness.
SK=[n(n−1)(
n−2)]∑ni=1(Xi−––X)3s3
n = number of observations in the sample
s = sample standard deviation
Note: when a distribution is nonsymmetrical, it is considered skewed.
Give the formula for the position of a percentile in a data set with n observations sorted in ascending order.
Ly=y 100 (n+1)
where:
y = percentage point at which we are dividing the distribution
Ly = location (L) of the percentile (Py) in the data set sorted in ascending order
How is the harmonic mean used, and what is the formula used to calculate it?
To determine the average cost of shares in dollar cost averaging. The dollar amount periodically invested can be divided by the share price to determine the number of shares at each purchase. The total amount invested is then divided by the total number of shares to determine the average cost of shares.
XH=¯n∑ni=1(1/Xi)
How are the sample and population variance calculated?
σ2=n∑i=1(Xi−––X)2
where:
Xi = observation i; μ = population mean; and N = size of the population
s2=∑ni=1(Xi−––X)2n−1
where:
n = sample size;
¯¯¯
X, the sample mean, is used in place of μ
Define mean absolute deviation (MAD) and give the calculation.
The mean absolute deviation is the average of the absolute values of deviations of observations from the mean.
MAD
=∑ni=1∣∣
∣∣Xi−––X∣∣∣∣n
where:
n = number of items in the data set
¯¯¯
X
=
the arithmetic mean of a sample
Identify the important relationships between the arithmetic mean and geometric mean of a data set. And identify when a geometric mean is frequently used.
The geometric mean is always less than or equal to the arithmetic mean.
The geometric mean equals the arithmetic mean only when all the observations are identical.
The difference between the geometric and arithmetic mean increases as the dispersion of observed values increases.
A geometric mean is frequently used when calculating average rates of return over multiple periods or to compute the growth rate of a variable.
Describe nominal, ordinal, interval, and ratio scales.
Nominal: weakest level of measurement; categorize or count data but do not rank them
Ordinal: stronger level of measurement than nominal scales; sort data in categories that are ranked according to a certain characteristic
Interval: rank observations such that the differences between scale values are equal so that values can be added and subtracted meaningfully
Ratio: have all the characteristics of interval scales and true zero point as the origin; strongest measurement level
Define median and mode.
The median is the value of the middle item of a data set once it has been arranged in ascending or descending order.
The mode is used to identify the most frequently occurring value of a data set.
Distinguished between a return distribution with positive skewness and one with negative skewness.
A return distribution with positive skewness has frequent small losses and few large gains, and rmode < rmedian < rmean.
A return distribution with negative skewness has frequent small gains and few large losses, and rmean < rmedian < rmode.