Reading 8 - Statistical Concepts and Market Returns Flashcards
Population
A population is defined as all members of a specified group. A sample is a subset of a population.
Parameter
A parameter is any descriptive measure of a population. A sample statistic (statistic, for short) is a quantity computed from or used to describe a sample.
Four major scales for data measurements
Data measurements are taken using one of four major scales: nominal, ordinal, interval, or ratio. Nominal scales categorize data but do not rank them. Ordinal scales sort data into categories that are ordered with respect to some characteristic. Interval scales provide not only ranking but also assurance that the differences between scale values are equal. Ratio scales have all the characteristics of interval scales as well as a true zero point as the origin. The scale on which data are measured determines the type of analysis that can be performed on the data.
Frequency distribution
A frequency distribution is a tabular display of data summarized into a relatively small number of intervals. Frequency distributions permit us to evaluate how data are distributed.
The relative frequency of observations in an interval
The relative frequency of observations in an interval is the number of observations in the interval divided by the total number of observations. The cumulative relative frequency cumulates (adds up) the relative frequencies as we move from the first interval to the last, thus giving the fraction of the observations that are less than the upper limit of each interval.
Histogram
A histogram is a bar chart of data that have been grouped into a frequency distribution. A frequency polygon is a graph of frequency distributions obtained by drawing straight lines joining successive points representing the class frequencies.
List of sample statistics and purpose
Sample statistics such as measures of central tendency, measures of dispersion, skewness, and kurtosis help with investment analysis, particularly in making probabilistic statements about returns.
List measure of central tendency and purpose
Measures of central tendency specify where data are centered and include the (arithmetic) mean, median, and mode (most frequently occurring value). The mean is the sum of the observations divided by the number of observations. The median is the value of the middle item (or the mean of the values of the two middle items) when the items in a set are sorted into ascending or descending order. The mean is the most frequently used measure of central tendency. The median is not influenced by extreme values and is most useful in the case of skewed distributions. The mode is the only measure of central tendency that can be used with nominal data.
A portfolio’s return
A portfolio’s return is a weighted mean return computed from the returns on the individual assets, where the weight applied to each asset’s return is the fraction of the portfolio invested in that asset.
Geometric mean
The geometric mean, G, of a set of observations X1, X2, …, X**n is G = (X1*X2*X3…Xn)^(1/n) with Xi >= 0 for i = 1, 2, …, n. The geometric mean is especially important in reporting compound growth rates for time series data.
Quantiles and types
Quantiles such as the median, quartiles, quintiles, deciles, and percentiles are location parameters that divide a distribution into halves, quarters, fifths, tenths, and hundredths, respectively.
Dispersion and types
Dispersion measures such as the variance, standard deviation, and mean absolute deviation (MAD) describe the variability of outcomes around the arithmetic mean.
Range
Range is defined as the maximum value minus the minimum value. Range has only a limited scope because it uses information from only two observations.
MAD
MAD for a sample is MAD = SUM(ABS(Xi-Xmean))/n where Xmean is the sample mean and n is the number of observations in the sample.
Variance & Standard deviation
The variance is the average of the squared deviations around the mean, and the standard deviation is the positive square root of variance. In computing sample variance (s2) and sample standard deviation, the average squared deviation is computed using a divisor equal to the sample size minus 1.
Semivariance
The semivariance is the average squared deviation below the mean; semideviation is the positive square root of semivariance. Target semivariance is the average squared deviation below a target level; target semideviation is its positive square root. All these measures quantify downside risk.
Chebyshev’s inequality
According to Chebyshev’s inequality, the proportion of the observations within k standard deviations of the arithmetic mean is at least 1 − 1/k2 for all k > 1. Chebyshev’s inequality permits us to make probabilistic statements about the proportion of observations within various intervals around the mean for any distribution with finite variance. As a result of Chebyshev’s inequality, a two-standard-deviation interval around the mean must contain at least 75 percent of the observations, and a three-standard-deviation interval around the mean must contain at least 89 percent of the observations, no matter how the data are distributed.
Coefficient of variation
The coefficient of variation, CV, is the ratio of the standard deviation of a set of observations to their mean value. A scale-free measure of relative dispersion, by expressing the magnitude of variation among observations relative to their average size, the CV permits direct comparisons of dispersion across different data sets.
Sharpe ratio
The Sharpe ratio for a portfolio, p, based on historical returns, is defined as Sh = (Rp – RF)/sp where Rp is the mean return to the portfolio, Rf is the mean return to a risk-free and asset, and s**p is the standard deviation of return on the portfolio.
Skew and skewness
Skew describes the degree to which a distribution is not symmetric about its mean. A return distribution with positive skewness has frequent small losses and a few extreme gains. A return distribution with negative skewness has frequent small gains and a few extreme losses. Zero skewness indicates a symmetric distribution of returns.
Kurtosis
Kurtosis measures the peakedness of a distribution and provides information about the probability of extreme outcomes. A distribution that is more peaked than the normal distribution is called leptokurtic; a distribution that is less peaked than the normal distribution is called platykurtic; and a distribution identical to the normal distribution in this respect is called mesokurtic. The calculation for kurtosis involves finding the average of deviations from the mean raised to the fourth power and then standardizing that average by the standard deviation raised to the fourth power. Excess kurtosis is kurtosis minus 3, the value of kurtosis for all normal distributions.
When analyzing investment returns, how do the geometric and arithmetic means compare?
Uncertainty in cash flows or returns causes the arithmetic mean to be larger than the geometric mean. The more uncertain the returns, the more divergence exists between the arithmetic and geometric means. The geometric mean return approximately equals the arithmetic return minus half the variance of return.51 Zero variance or zero uncertainty in returns would leave the geometric and arithmetic return approximately equal, but real-world uncertainty presents an arithmetic mean return larger than the geometric.