Reading 7 LOS's Flashcards
LOS 7a: distinguish between descriptive statistics and inferential statistics, between a population and a sample, and among the types of measurement scales
descriptive Statistics refer to how large volumes of data are converted into useful, readily understood information by summarizing their important characteristics
Inferential Statistics refer to methods used to make forecasts, estimates, or draw conclusion about a larger set of data based on a smaller representative set.
A population includes all the members of a particular group. It is usually very costly and time consuming to obtain data for each member of the population, therefore information about a small subset of the population, called a sample, is collected and conclusions about the population are drawn from them
Measurement Scales
Nominal scales represent the weakest level of measurement. They categorize of count data but do not rank them
Ordinal Scales represent a stronger level of measurement than nominal scales. They sort data in categories that are ranked according to certain characteristics. However the scale tells us nothing about the magnitude of difference between the characteristics
Interval Scales rank observations in such a manner that the differences between scale values are equal so that values can be added and subtracted meaningfully. Ratios calculated using this scale are meaningless, as the ranks do not have to be linear
ratio scales represent the strongest level of measurement. They have all the characteristics of interval scales and have a true zero point as the origin. Therefore meaningful ratios can also be computed with ratio scales
LOs 7b: Define a parameter, a sample statistic, and a frequency distribution.
A descriptive measure of a population characteristic is known as a parameter. Investment analysts are usually interested in only a few parameters, including mean and variance of asset returns.
A sample statistic is a parameter that is derived from the sample of the population
Frequency distribution is a tabular illustration of data categorized into a relatively small number of intervals or classes. The intervals must be such that each observation must fall into one interval, and the set of intervals must cover the entire range of values represented in the data.
LOs 7c: Calculate and interpret relative frequencies and cumulative relative frequencies, given a frequency distribution
LOS 7d: Describe the properties of a data set presented as a histogram or a frequency polygon
The Relative frequency for an interval is the proportion or fraction of total observations that lies in that particular interval. each interval’s relative frequency is calculated by dividing its absolute frequency by the total number of observations
The cummulative absolute frequency or cumulative frequency for an interval is the number of observations that are less than the upper bound of the interval
The cumulative relative frequency for an interval is the proportion of total observations that is less than the upper bound of the interval. It is calculated by adding the relative frequences of all intervals lower than and including the said interval
Histograms and Frequency Polygons
A histogram is used to graphically represent the data contained in frequency distribution. To construct a histogram, the intervals are listed on the horizontal axis, while the frequences are scaled on the vertical.
A frequency polygon also graphically illustrates the data in a fequency distribution. Each coordinate or point on the frequency polygon is the frequency of each interval plotted against the midpoint of the interval
LOS 7e: Calculate and interpret measures of central tendency, including the population mean, sample mean, arithmetic mean, weighted average or mean, geometric mean, harmonic mean, median, and mode.
The arithmetic mean is simply the sume of all observations in a data set divided by the total number of observations. It can be calculated as population µ or as sample mean X-bar.
- µ = Σxi / N
Properties of Arithmetic Mean
- All observations are used in the computation of the arithmetic mean
- All interval and ratio data sets have an arithmetic mean
- The sum of the deviations from the arithmetic mean is always 0
- An arithmetic mean is unique, there can only be one per data set.
A potential problem with the Arithmetic mean is its sensitivy to extreme values.
The median is the value of the middle item of a data set once it has been arranged in ascending or descending order. The advantage is unlike mean, it is not sensitive to extreme values. However it does not use all information about the size and magnitude of obeservations
The mode os a data set is its most frequent occurring value. A data set with one mode is unimodal while one with 2 is bimodal. For grouped data the modal interval is the interval with the highest freqeuncy.
The weighted mean is calculated by assigning different weights to observations in the data set to account for the disproportionate effect of certain observations on the arithmetic mean. This helps deal with extreme values by assinging them low weights
The geometric mean is frequently used to average rates of change over time or to calculate the growth rate of a variable over a period. In the investment arena it is used to average rates of return over multiple periods of to compute the growth rate of a variable. it is calculated as:
- G = square root ( X1 x X2 x……. Xn
In order to calculate the geometric mean for investment returns data, we must add 1 to each return observation and then subtract 1 from the result.
Remember the following important relationships between the arithmetic mean and geometric mean of a particular data set:
- The geometric mean is always less than or equal to the arithmetic
- THe geometric mean equals the arithmetic mean only when all observations are identical
- The difference between the geometric and arithmetic mean increases as the dispersion in observed values increases
The harmonic mean is a relatively specialized concept that is used in the investment management arena to determine the average cost of shares purchased over time. It is calculated as:
- H = N / Σ 1/xi
LOS 7f: Calculate and interpret quartiles, quintiles, deciles, and percentiles.
Quartiles break down distributions into quarters
quintiles into fifths
deciles into tenths
and percentiles into hundreths
The formula for the position of a percentile in a data set with n observations sorted in ascending order is:
- Ly = (n+1)y/ 100.
So if we wanted to find where the first quartile ended with 7 data observations we would find that by:
- (7+1)(25/100)= 2
If we were to do it for 8 data observations we would get:
- (8+1)(25/100)= 2.25
This means that the first quartile is the second observation from the left plus .25 times the difference of the second and third observations. So if the 2nd observation was 5% and the 3rd was 10%, we would get a 1st quartile of :
- 5% + .25(10%-5%) = 6.25%
LOS 7g: Calculate and interpret 1) a range and a mean absolute deviation and 2) the variance and standard deviation of a population and of a sample
The range is simply the difference between the highest and lowest values in a data set - Range = Max value - Min value
The mean absolute deviation (MAD) is the average of the absolute values of deviations of observations in a data set from its mean.
MAD = Σ | Xi - Xbar | / n
Variance is the sume of the squares of deviations from the mean
Standard Deviation is the postive square root of the variance
while variance has no units, the standard deviation is expressed in the same units as the random variable itself. Variance is calculated as:
- σ2 = Σ (Xi - µ)2 / N , and the standard deviation is the square root of this
LOS 7h: Calculate and interpret the proportion of observations falling within a specified number of standard deviations of the mean using Chebyshev’s inequality
Chebyshev’s ineqaulity is a method of calculating an approximate value for the proportion of observations in a data set that lie within k standard deviations from the mean.
proportion of observations with k standard deviations from mean= 1 - 1/k2
LOS 7i: Calculate and interpret the coeffcient of variation and the Sharpe ratio
It can be difficult to determine what standard deviation means in terms of relative dispersion data if the data sets being compared have different means and if they use different units of measurement.
We can compare the relative dispersion of returns on two investments using the coefficient of variation, which is the ratio of the standard deviation of the data set to its mean
- CV = σ /µ
CV is used to measure the risk per unit of return in various investments
The Sharpe Ratio is the ratio of excess return over the risk-free rate from an investment to its standard deviation of returns. It basically measures excess return per unit of risk
Sharpe Ratio = rp - rf / sp
where rp= mean portfolio return and sp = standard deviation of portfolio returns
Issues with the Sharpe ratio
- All other factors remaining the same, for portfolios with positive Sharpe Ratios, the Sharpe ratio decreases if we increase risk. A portfolio with a higher positive Sharpe ratio offers a better risk-adjusted return. However for portfolios with negative Sharpe ratios, the ratio increases if we increase risk. With negative Sharpe ratios we cannot assume that the portfolio iwth the higher Sharpe ratio offers a better risk-adjusted performance. If the standard deviation is the same with two portfolios with negative Sharpe values, then the higher ratio offers the better risk-adjusted performance
- The standard deviation is an appropriate measure of risk only for investments and strategies that have approximately symmetric distributions
LOS 7j: Explain skewness and the meaning of a positively or negatively skewed return distribution
LOS 7k: Describe the relative locations of the mean, median, and mode for a unimodal, nonsymmetrical distribution
When a distribution is nonsymmetrical it is said to be skewed.
A positive skew has a long tail on the right side, which suggests that there are certain observations that are much larger in value than most of the observations in the data set. For a positive skew ( Mean > Median > Mode
A negatively skewed distribution has a long tail on the left, which suggests there are outliers that are much smaller in value than the majority of the observations
(Mean< Median < Mode)
LOS 7l: Explain measures of sample skewness and kurtosis
Sample Skewness is calculated as:
- SK = [n / (n-1)(n-2)] x [(Σ( Xi - Xbar)3 / s3]
As n becomes large the expression reduces to the mean cubed deviation
- SK = 1/n x [(Σ( Xi - Xbar)3 / s3]
- When the distribution is positively skewed, sample skewness is positive
- sample skewness of 0 indicates the data follows a symmetrical distribution
- Asbsolute values of skewness greater than .5 suggest that the data set is significantly skewed.
Kurtosis measures the extent to which a distribution is more or less peaked than a normal distribution. A normal distribtuion has a kurtosis of 3.
- a leptokurtic distribution is more peaked and has fatter tails than a normal distribution and has excess kurtosis greater than 0
- A platykurtic distribution is less peaked and has thinner tails than a normal distribution and has an excess kurtosis of less than 0
- a mesokurtic distribution is identical to a normal distribution and have an excess kurtosis of zero
AS n becomes large kurtosis breaks down to:
- K = 1/n x [Σ(Xi - Xbar)4 / s4] - 3
LOS 7m: Compare the use of arithmetic and geometric means when analyzing investment returns
- If we want to gauge performance over a single period, the arithmetic mean should be used because the arithmetic mean is the average of one-period returns
- If we wnat to estimate returns over more than one period, we should use the geometric mean as it measures how investment returns are linked over time
Arithmetic vs Geometric
- Uncertainty in cash flows or returns causes the arithmetic mean to be larger than the geometric mean. The more uncertainty, teh greater the divergence between the two
- Zero variance or zero uncertainty would result in the geometric and arithmetic mean returns being equal
- Studies have shown that the geometric mean return approximately equals the arithmetic mean minus half the variance of returns