L7. Statistical Concepts and Market Returns Flashcards
- Distinguish between descriptive stats and inferential stats, between population and a sample and among the types of measurement scales - Define parameter, sample statistic and frequency distribution - Calculate and interpret relative frequencies and cumulative relative frequencies, given a frequency distribution
Learning outcomes
- Distinguish between descriptive stats and inferential stats, between population and a sample and among the types of measurement scales
- Define parameter, sample statistic and frequency distribution
- Calculate and interpret relative frequencies and cumulative relative frequencies, given a frequency distribution
- Describe properties of data set presented as a histogram or frequency polygon
- Calculate and interpret measures of central tendency, including population mean, sample mean, arithmetic mean, weighted average or mean, geometric mean, harmonic mean, median and mode
- Calculate and interpret quartiles, quintiles, deciles and percentiles
- Calculate and interpret 1) range and a mean absolute deviation and 2) variance and standard deviation of a population and of a sample
- Calculate and interpret the proportion of observations failing within a specified number of standard deviations of the mean using Chebyshev’s inequality
- Calculate and interpret the coefficient of variation
- Explain skewness and meaning of a positively or negatively skewed return distribution
- Describe the relative locations of the mean, median and mode for a unimodal, non symmetrical distribution
- Explain measures of sample skewness and kurtosis
- Compare the use of arithmetic and geometric means when analysing investment returns
Demonstration of statistical methods that allow us to summarise return distributions
4 properties of return distributions:
- Where the returns are centered (central tendency)
- How far returns are dispersed from their center (dispersion)
- Whether the distribution of returns is symmetrically shaped or lopsided (skewness) and
- Whether extreme outcomes are likely (kurtosis)
Nature of statistics
2 broad meanings; 1) data and 2) method
Statistical methods include
- Descriptive statistics
- study of how data can be summarised effectively to describe the important aspects of large data sets
- consolidating mass of numerical details - Statistical inference
- involves making forecasts, estimates, or judgements about a larger group from the smaller group actually observed
Populations and samples
Population
- defined as all members of a specified group
Parameter
- descriptive measure of a population is called a parameters
Sample
- a subset of a population
Sample statistic
- A quantity computed form or used to describe a sample
Measurement scales
All data measurements are taken on one of the 4 major scales; nominal, ordinal, interval or ratio
- Nominal scales
- represent weakest level of measurement
- categorise data but do not rank them
EG: Hedge fund classification types - Ordinal scales
- reflects stronger level of measurement
- sort data into categories that are ordered with respect to some characteristics
- eg, morningstar and S&P star ratings for mutual funds represent an ordinal scale in which one star represents a group of funds judged to have had relatively worst performance, with 2,3,4 and 5 stars representing groups with increasing better performance
- although performance ranked, but do not tell difference in performance between funds
EG: Credit ratings for bond issues - Interval scales
- not only ranking but also assurance that the differences between scale values are equal
- eg celsius scales; temperature between 10DC and 11DC is the same amount as the difference between 40DC and 41DC
- zero point of an internal scale does not reflect complete absence of what is being measured; not a true zero point or natural zero. Eg. 0DC doesnt mean absence of temperature but freezing point - Ratio scales
- represents strongest level of measurement
- all characteristics of interval measurement scales and true zero point as origin
EG: Cash dividends per share and bond maturity in years
Summarising data using frequency distribution
- frequency distribution is a tabular display of data summarised into a relatively small number of intervals
- help in the analysis of large amounts of statistical data and work with all types of measurement scales
- as rates of returns are fundamental units used for making investment decisions, when we analyse, our starting point is the holding period return (also called the total return)
R = (Pt - Pt-1 + D1)/ Pt-1
where Pt = price per share at end of time period t
Pt-1 = price per share at end of time period t-1, the time period immediately preceding time period t
D = cash distributions received during time period t
Thus holding period return for time period t is the capital gain/loss plus distribution divided by beginning period price
2 characteristics for holding period return
- has element of time (if monthly time interval is used, rate of return is a monthly figure)
- no currency unit attached to it
Construction of frequency distribution
- sort date in ascending order
- calculate range of data, defined as range = maximum value - minimum value
- decide on number of intervals in frequency distribution, k
- determine interval width as range/k
- determine intervals by successively adding the interval width to minimum value, to determine ending points of interval, stopping after reaching an interval that includes max value
- count the number of observations falling in each interval
- construct table of interval listed from smallest to largest that shows number of observations falling in each interval
EG. -4.57, -4.04, -1.64, 0.28, 1.34, 2.35, 2.38, 4.28, 4.42, 4.68, 7.16, 11.43
Step 1: -4.57, -4.04, -1.64, 0.28, 1.34, 2.35, 2.38, 4.28, 4.42, 4.68, 7.16, 11.43
Step 2: Max range = +11.43, Min range = -4.57 Therefore range is +11.43 - (-4.57) = 16
Step 3: number of intervals 4
Step 4: 16/4 = 4
Step 5:
- 4.57 + 4 = -0.57
- 0.57 + 4 = 3.43
3. 43 + 4 = 7.43
7. 43 + 4 = 11.43
Step 6 and 7
Interval A -4.57 < observations < - 0.57 AF (3)
Interval B -0.57 < observations < 3.43 AF (4)
Interval C 3.43 < observations < 7.43 AF (4)
Interval D 7.43 < observations < 11.43 AF (1)
- interval is set of values within which an observation falls
- actual number of observations in a given interval is called absolute frequency (AF)
- relative frequency is the absolute frequency of each interval divided by the total number of observations
- cumulative relative frequency cumulates (adds up) the relative frequencies as we move from list to the last interval. it tells us the fraction of observations that are less than the upper limit of each interval
Example of frequency distribution
Return interval AF. RF. CAF. CRF
- 0 - 6.0 3. 15.79. 3. 15.79
- 0 - 7.0 2 10.53 5 26.32
- 0 - 8.0 6 31.58. 11. 57.90
- 0 - 9.0 6. 31.58. 17. 89.47
- 0 - 10.0 2. 10.53. 19 100
Graphic presentation of data
Histogram
- bar chart of data that have been grouped into a frequency distribution
- y axis = return intervals
- x axis = frequency
Frequency polygon
- x axis = frequency
- y axis = return interval midpoints
Cumulative frequency distribution
- steep slope reflects that most observation lie in the neighbour of the interval limits
- x axis = cumulative frequency
- y axis = return interval upper limits
The use of quantitative measures that explain characteristics of data
- Central tendency specifies where data are centered
a) Arithmetic mean
- sum of observations divided by the number of observations
- used to compute for both population mean and sample mean
- -> population mean
- arithmetic mean value of a population
- eg profit for 3 coys are 0, 2.1 and 2.0 respectively then population mean will be as follows
- (0+2.1+2.0)/3 = 1.37 %
- -> sample mean
- arithmetic mean value of a sample
- eg. P/E for 6 coys are 35, 30, 22, 18, 15, 12 then the sample mean P/E is (35+30+22+18+15+12)/6 = 22
- also called the arithmetic average
- if we examine data across 100 units, its called cross-sectional data and these observations are called cross sectional mean
- if we examine data or sample from historical monthly returns from 1 unit, its called time series date and these observations are called time series mean
Advantage of mean
- mean uses all information about size and magnitude of observations
Disadvantage of mean
- sensitive to extreme values
- as all observations are used to compute mean, mean can be pulled sharply upward or downward by extremely large or small observations
- eg 1, 2, 3, 4, 5, 6 and 1000 but arithmetic mean is 146, much larger than the bulk of observations (1st 6)
Deviations from mean
Typically use mean return as a measure of the typical outcome. However, some outcomes are above mean, some below. We can calculate distance between mean and each outcome and call it deviation
The use of quantitative measures that explain characteristics of data
b) Median
- value of the middle item of a set of items that has been sorted into ascending or descending order
- in odd numbered sample n, medium occupies the (n+1)/2 position
- in even numbered sample, mean of values of items occupying the n/2 and (n+2)/2 positions
- no matter even or add, equal number of observations lie above and below the median
- eg 0.0, 2.0, 2.1 = odd numbered. Median = (3+1)/2 = 2nd position = 2 percent
Advantage
- extreme values do not affect it
Disadvantage
- does not use all information about size and magnitude of observations; focus only on the relative position of the ranked observations
- need to order the observations in order then determine if odd or even before calculation
Practice Qns on mean and median
7 coys P/E
23.44, 17.62, 5.65, 17.46, 25.95, 143.11, 22.95
a) What is the arithmetic mean P/E?
Sum of all/ 7 = 36.60
b) What is the median P/E?
Rank in ascending order then use (7+1)/2 = 4 since is odd numbered. 4th ranking is 22.95
c) Evaluate the mean and median P/E.
The use of median P/E is more appropriate for this portfolio because the mean is 33.60, which is bigger than the first 6 of the portfolio except for 143.11. Hence, 22.95 is more appropriate as a measurement
The use of quantitative measures that explain characteristics of data
c) Mode
- the most frequently occurring value in a distribution
- can have 1 or more modes, and even no mode
- when have 1, is called unimodal
- when have 2, it is bimodal
- when have 3, is trimodal
- modal intervals = most frequently occurring interval with reference to a grouped data
Practice Qns for Mode and Median
Credit ratings of 6 US departmental stores
Baa3, Baa2, Baa3, Caa3, Baa1, Caa2
- What is the mode credit rating? (which has highest frequency?)
Baa3 - 2, the rest 1 - What is the median credit rating?
6/2 = 3rd position
Sorting according to ascending order, Baa3 is in 3rd position
The use of quantitative measures that explain characteristics of data
The weighted mean
- for arithmetic mean, all observations are equally weighted by factor 1/n
- eg 70 million to equities, and 30 million to bonds, portfolio has a weight of 0.7
- multiplying return on stock investment by 0.7 and return on bond investment by 0.3 for bonds. summation of both is the weighted mean
eg. Year 1, -33.1% equity fund, -0.1 bond fund
allocation = 60% on stock fund, 40% on bonds
The weighted mean
= (0.6x-33.1) + (0.4x-0.1) = -19.86 + -0.04 = -19.90%
If manager maintains constant weights of 60% shares and 40% bonds for all 5 years, method is called constant-proportions strategy
Year 1, -33.1% equity fund, -0.1 bond fund
Year 2, 34.1% equity fund, 11.0 bond fund
Year 3, 16.8% equity fund, 6.4 bond fund
Year 4, -9.2% equity fund, 8.4 bond fund
Year 5, 6.4% equity fund, 3.8 bond fund
If using same method for all 5 years Y1= (0.6x-33.1) + (0.4x-0.1) = -19.9% Y2 = (0.6 x34.1) + (0.4x11.0) = 24.9% Y3 = (0.6 x16.8) + (0.4x6.4) = 12.6% Y4 = (0.6 x-9.2) + (0.4x8.4) = -2.2% Y5 = (0.6 x6.4) + (0.4x3.8) = 5.4%
So the time series mean of returns for 5 years = (-19.9 + 24.9 + 12.6 - 2.2 + 5.4)/ 5 = 4.2%
ALTERNATIVE METHOD
Finding arithmetic mean for stock fund
= (-33.1 + 34.1 + 16.8 - 9.2 + 6.4)/ 5 = 3
Finding arithmetic mean for bond fund
= (-0.1 + 11 + 6.4 + 8.4 + 3.8)/ 5 = 5.9
Weighted mean = (0.6x3) + (0.4x5.9) = 4.2%
When we take a weighted average of forward looking data = weighted mean is expected value
Practice Qns for weighted mean
Asset class 1, 4.7 asset allocation, 1.2% Class return Asset class 2, 29 asset allocation, 8% Class return Asset class 3, 11.8 asset allocation, 1.2% Class return Asset class 4, 10.5 asset allocation, 8.2% Class return Asset class 5, 24.8asset allocation, 15.4% Class return Asset class 6, 19 asset allocation, 15.6% Class return Asset class 7, 0.2 asset allocation, 5.7% Class return
What is the mean return?
(0.047x1.2) + (0.29 x 8) + (0.118 x 1.2) + (0.105 x 8.2) + (0.248 x 15.4) + (0.19 x 15.6) + (0.002 x 5.7) = 10.2%
The use of quantitative measures that explain characteristics of data
Geometric mean
- used most frequently to average rates of change over time or to compute growth rate of a variable
- figure cannot be less than 0, therefore add 1 in all decimals
- usually add 1 so that returns will not be negative. After finding the geometric mean, we subtract 1 to find the %
- to key sq root click OPTION V on mac
Practice Qns for geometric mean and arithmetic mean
Y1 - SLASX 34.90%, PRFDX 31.69% Y2 - SLASX 6.13%, PRFDX 7.75% Y3 - SLASX 2.69%, PRFDX -7.56% Y4 - SLASX 11.66%, PRFDX 18.25% Y5 - SLASX 21.77%, PRFDX 16.18%
a) Find geometric mean return of SLASX = 5√(1.3490)(1.0613)(1.0269)(1.1166)(1.2177) - 1 = 5√1.9990157 - 1 = 1.1485853 - 1 = 0.1485 = 14.85%
b) Find arithmetic mean return of SLASX and contrast geometric mean
= (0.3490+0.0613+0.0269+0.1166+0.2177)/5
= 15.43%
Contrast = 15.43% - 14/85% = 0.57% or 57 basis points
c) Find geometric mean return of PRFDX = 5√(1.3169)(1.0775)(1-0.0756)(1.1825)(1.1618) - 1 = 5√1.8020321 - 1 = 1.125 - 1 = 0.125 = 12.5%
d) Find arithmetic mean return of PRFDX and contrast geometric mean
= (0.3169+0.0775-0.0756+0.1825+.01618)/5
= 0.13262 = 13.26%
Contrast = 13.26% - 12.5% = 0.76% or 76 basis points
Summary
- Geometric mean always less or equal to arithmetic mean
- Difference between arithmetic mean and geometric means increases with variability in the period by period observations
- SLASX has higher means than PRFDX but difference is lower
- -> this means that a dollar invested in SLASX will compound to $1.9990 and a dollar invested in PRFDX will compound to $1.125