L7. Statistical Concepts and Market Returns Flashcards

- Distinguish between descriptive stats and inferential stats, between population and a sample and among the types of measurement scales - Define parameter, sample statistic and frequency distribution - Calculate and interpret relative frequencies and cumulative relative frequencies, given a frequency distribution

1
Q

Learning outcomes

A
  • Distinguish between descriptive stats and inferential stats, between population and a sample and among the types of measurement scales
  • Define parameter, sample statistic and frequency distribution
  • Calculate and interpret relative frequencies and cumulative relative frequencies, given a frequency distribution
  • Describe properties of data set presented as a histogram or frequency polygon
  • Calculate and interpret measures of central tendency, including population mean, sample mean, arithmetic mean, weighted average or mean, geometric mean, harmonic mean, median and mode
  • Calculate and interpret quartiles, quintiles, deciles and percentiles
  • Calculate and interpret 1) range and a mean absolute deviation and 2) variance and standard deviation of a population and of a sample
  • Calculate and interpret the proportion of observations failing within a specified number of standard deviations of the mean using Chebyshev’s inequality
  • Calculate and interpret the coefficient of variation
  • Explain skewness and meaning of a positively or negatively skewed return distribution
  • Describe the relative locations of the mean, median and mode for a unimodal, non symmetrical distribution
  • Explain measures of sample skewness and kurtosis
  • Compare the use of arithmetic and geometric means when analysing investment returns
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Demonstration of statistical methods that allow us to summarise return distributions

A

4 properties of return distributions:

  1. Where the returns are centered (central tendency)
  2. How far returns are dispersed from their center (dispersion)
  3. Whether the distribution of returns is symmetrically shaped or lopsided (skewness) and
  4. Whether extreme outcomes are likely (kurtosis)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Nature of statistics

A

2 broad meanings; 1) data and 2) method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Statistical methods include

A
  1. Descriptive statistics
    - study of how data can be summarised effectively to describe the important aspects of large data sets
    - consolidating mass of numerical details
  2. Statistical inference
    - involves making forecasts, estimates, or judgements about a larger group from the smaller group actually observed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Populations and samples

A

Population
- defined as all members of a specified group

Parameter
- descriptive measure of a population is called a parameters

Sample
- a subset of a population

Sample statistic
- A quantity computed form or used to describe a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Measurement scales

A

All data measurements are taken on one of the 4 major scales; nominal, ordinal, interval or ratio

  1. Nominal scales
    - represent weakest level of measurement
    - categorise data but do not rank them
    EG: Hedge fund classification types
  2. Ordinal scales
    - reflects stronger level of measurement
    - sort data into categories that are ordered with respect to some characteristics
    - eg, morningstar and S&P star ratings for mutual funds represent an ordinal scale in which one star represents a group of funds judged to have had relatively worst performance, with 2,3,4 and 5 stars representing groups with increasing better performance
    - although performance ranked, but do not tell difference in performance between funds
    EG: Credit ratings for bond issues
  3. Interval scales
    - not only ranking but also assurance that the differences between scale values are equal
    - eg celsius scales; temperature between 10DC and 11DC is the same amount as the difference between 40DC and 41DC
    - zero point of an internal scale does not reflect complete absence of what is being measured; not a true zero point or natural zero. Eg. 0DC doesnt mean absence of temperature but freezing point
  4. Ratio scales
    - represents strongest level of measurement
    - all characteristics of interval measurement scales and true zero point as origin
    EG: Cash dividends per share and bond maturity in years
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Summarising data using frequency distribution

A
  • frequency distribution is a tabular display of data summarised into a relatively small number of intervals
  • help in the analysis of large amounts of statistical data and work with all types of measurement scales
  • as rates of returns are fundamental units used for making investment decisions, when we analyse, our starting point is the holding period return (also called the total return)

R = (Pt - Pt-1 + D1)/ Pt-1
where Pt = price per share at end of time period t
Pt-1 = price per share at end of time period t-1, the time period immediately preceding time period t
D = cash distributions received during time period t

Thus holding period return for time period t is the capital gain/loss plus distribution divided by beginning period price

2 characteristics for holding period return

  1. has element of time (if monthly time interval is used, rate of return is a monthly figure)
  2. no currency unit attached to it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Construction of frequency distribution

A
  1. sort date in ascending order
  2. calculate range of data, defined as range = maximum value - minimum value
  3. decide on number of intervals in frequency distribution, k
  4. determine interval width as range/k
  5. determine intervals by successively adding the interval width to minimum value, to determine ending points of interval, stopping after reaching an interval that includes max value
  6. count the number of observations falling in each interval
  7. construct table of interval listed from smallest to largest that shows number of observations falling in each interval

EG. -4.57, -4.04, -1.64, 0.28, 1.34, 2.35, 2.38, 4.28, 4.42, 4.68, 7.16, 11.43

Step 1: -4.57, -4.04, -1.64, 0.28, 1.34, 2.35, 2.38, 4.28, 4.42, 4.68, 7.16, 11.43

Step 2: Max range = +11.43, Min range = -4.57 Therefore range is +11.43 - (-4.57) = 16

Step 3: number of intervals 4

Step 4: 16/4 = 4

Step 5:

  • 4.57 + 4 = -0.57
  • 0.57 + 4 = 3.43
    3. 43 + 4 = 7.43
    7. 43 + 4 = 11.43

Step 6 and 7

Interval A -4.57 < observations < - 0.57 AF (3)
Interval B -0.57 < observations < 3.43 AF (4)
Interval C 3.43 < observations < 7.43 AF (4)
Interval D 7.43 < observations < 11.43 AF (1)

  • interval is set of values within which an observation falls
  • actual number of observations in a given interval is called absolute frequency (AF)
  • relative frequency is the absolute frequency of each interval divided by the total number of observations
  • cumulative relative frequency cumulates (adds up) the relative frequencies as we move from list to the last interval. it tells us the fraction of observations that are less than the upper limit of each interval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Example of frequency distribution

A

Return interval AF. RF. CAF. CRF

  1. 0 - 6.0 3. 15.79. 3. 15.79
  2. 0 - 7.0 2 10.53 5 26.32
  3. 0 - 8.0 6 31.58. 11. 57.90
  4. 0 - 9.0 6. 31.58. 17. 89.47
  5. 0 - 10.0 2. 10.53. 19 100
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Graphic presentation of data

A

Histogram

  • bar chart of data that have been grouped into a frequency distribution
  • y axis = return intervals
  • x axis = frequency

Frequency polygon

  • x axis = frequency
  • y axis = return interval midpoints

Cumulative frequency distribution

  • steep slope reflects that most observation lie in the neighbour of the interval limits
  • x axis = cumulative frequency
  • y axis = return interval upper limits
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The use of quantitative measures that explain characteristics of data

A
  1. Central tendency specifies where data are centered

a) Arithmetic mean
- sum of observations divided by the number of observations
- used to compute for both population mean and sample mean

  • -> population mean
  • arithmetic mean value of a population
  • eg profit for 3 coys are 0, 2.1 and 2.0 respectively then population mean will be as follows
  • (0+2.1+2.0)/3 = 1.37 %
  • -> sample mean
  • arithmetic mean value of a sample
  • eg. P/E for 6 coys are 35, 30, 22, 18, 15, 12 then the sample mean P/E is (35+30+22+18+15+12)/6 = 22
  • also called the arithmetic average
  • if we examine data across 100 units, its called cross-sectional data and these observations are called cross sectional mean
  • if we examine data or sample from historical monthly returns from 1 unit, its called time series date and these observations are called time series mean

Advantage of mean
- mean uses all information about size and magnitude of observations

Disadvantage of mean

  • sensitive to extreme values
  • as all observations are used to compute mean, mean can be pulled sharply upward or downward by extremely large or small observations
  • eg 1, 2, 3, 4, 5, 6 and 1000 but arithmetic mean is 146, much larger than the bulk of observations (1st 6)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Deviations from mean

A

Typically use mean return as a measure of the typical outcome. However, some outcomes are above mean, some below. We can calculate distance between mean and each outcome and call it deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The use of quantitative measures that explain characteristics of data

A

b) Median
- value of the middle item of a set of items that has been sorted into ascending or descending order
- in odd numbered sample n, medium occupies the (n+1)/2 position
- in even numbered sample, mean of values of items occupying the n/2 and (n+2)/2 positions
- no matter even or add, equal number of observations lie above and below the median

  • eg 0.0, 2.0, 2.1 = odd numbered. Median = (3+1)/2 = 2nd position = 2 percent

Advantage
- extreme values do not affect it

Disadvantage

  • does not use all information about size and magnitude of observations; focus only on the relative position of the ranked observations
  • need to order the observations in order then determine if odd or even before calculation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Practice Qns on mean and median

A

7 coys P/E

23.44, 17.62, 5.65, 17.46, 25.95, 143.11, 22.95

a) What is the arithmetic mean P/E?
Sum of all/ 7 = 36.60

b) What is the median P/E?
Rank in ascending order then use (7+1)/2 = 4 since is odd numbered. 4th ranking is 22.95

c) Evaluate the mean and median P/E.
The use of median P/E is more appropriate for this portfolio because the mean is 33.60, which is bigger than the first 6 of the portfolio except for 143.11. Hence, 22.95 is more appropriate as a measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The use of quantitative measures that explain characteristics of data

A

c) Mode
- the most frequently occurring value in a distribution
- can have 1 or more modes, and even no mode
- when have 1, is called unimodal
- when have 2, it is bimodal
- when have 3, is trimodal
- modal intervals = most frequently occurring interval with reference to a grouped data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Practice Qns for Mode and Median

A

Credit ratings of 6 US departmental stores
Baa3, Baa2, Baa3, Caa3, Baa1, Caa2

  1. What is the mode credit rating? (which has highest frequency?)
    Baa3 - 2, the rest 1
  2. What is the median credit rating?
    6/2 = 3rd position
    Sorting according to ascending order, Baa3 is in 3rd position
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The use of quantitative measures that explain characteristics of data

A

The weighted mean

  • for arithmetic mean, all observations are equally weighted by factor 1/n
  • eg 70 million to equities, and 30 million to bonds, portfolio has a weight of 0.7
  • multiplying return on stock investment by 0.7 and return on bond investment by 0.3 for bonds. summation of both is the weighted mean

eg. Year 1, -33.1% equity fund, -0.1 bond fund
allocation = 60% on stock fund, 40% on bonds
The weighted mean
= (0.6x-33.1) + (0.4x-0.1) = -19.86 + -0.04 = -19.90%

If manager maintains constant weights of 60% shares and 40% bonds for all 5 years, method is called constant-proportions strategy

Year 1, -33.1% equity fund, -0.1 bond fund
Year 2, 34.1% equity fund, 11.0 bond fund
Year 3, 16.8% equity fund, 6.4 bond fund
Year 4, -9.2% equity fund, 8.4 bond fund
Year 5, 6.4% equity fund, 3.8 bond fund

If using same method for all 5 years
Y1= (0.6x-33.1) + (0.4x-0.1) = -19.9%
Y2 = (0.6 x34.1) + (0.4x11.0) = 24.9%
Y3 = (0.6 x16.8) + (0.4x6.4) = 12.6%
Y4 = (0.6 x-9.2) + (0.4x8.4) = -2.2%
Y5 = (0.6 x6.4) + (0.4x3.8) = 5.4%

So the time series mean of returns for 5 years = (-19.9 + 24.9 + 12.6 - 2.2 + 5.4)/ 5 = 4.2%

ALTERNATIVE METHOD
Finding arithmetic mean for stock fund
= (-33.1 + 34.1 + 16.8 - 9.2 + 6.4)/ 5 = 3
Finding arithmetic mean for bond fund
= (-0.1 + 11 + 6.4 + 8.4 + 3.8)/ 5 = 5.9

Weighted mean = (0.6x3) + (0.4x5.9) = 4.2%

When we take a weighted average of forward looking data = weighted mean is expected value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Practice Qns for weighted mean

A
Asset class 1, 4.7 asset allocation, 1.2% Class return
Asset class 2, 29 asset allocation, 8% Class return
Asset class 3, 11.8 asset allocation, 1.2% Class return
Asset class 4, 10.5 asset allocation, 8.2% Class return
Asset class 5, 24.8asset allocation, 15.4% Class return
Asset class 6, 19 asset allocation, 15.6% Class return
Asset class 7, 0.2 asset allocation, 5.7% Class return

What is the mean return?

(0.047x1.2) + (0.29 x 8) + (0.118 x 1.2) + (0.105 x 8.2) + (0.248 x 15.4) + (0.19 x 15.6) + (0.002 x 5.7) = 10.2%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The use of quantitative measures that explain characteristics of data

A

Geometric mean

  • used most frequently to average rates of change over time or to compute growth rate of a variable
  • figure cannot be less than 0, therefore add 1 in all decimals
  • usually add 1 so that returns will not be negative. After finding the geometric mean, we subtract 1 to find the %
  • to key sq root click OPTION V on mac
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Practice Qns for geometric mean and arithmetic mean

A
Y1 - SLASX 34.90%, PRFDX 31.69% 
Y2 - SLASX 6.13%, PRFDX 7.75% 
Y3 - SLASX 2.69%, PRFDX -7.56% 
Y4 - SLASX 11.66%, PRFDX 18.25% 
Y5 - SLASX 21.77%, PRFDX 16.18% 
a) Find geometric mean return of SLASX 
= 5√(1.3490)(1.0613)(1.0269)(1.1166)(1.2177) - 1 
= 5√1.9990157 - 1 
= 1.1485853 - 1 
= 0.1485 = 14.85% 

b) Find arithmetic mean return of SLASX and contrast geometric mean
= (0.3490+0.0613+0.0269+0.1166+0.2177)/5
= 15.43%
Contrast = 15.43% - 14/85% = 0.57% or 57 basis points

c) Find geometric mean return of PRFDX
= 5√(1.3169)(1.0775)(1-0.0756)(1.1825)(1.1618) - 1 
= 5√1.8020321 - 1 
= 1.125 - 1 
= 0.125 = 12.5% 

d) Find arithmetic mean return of PRFDX and contrast geometric mean
= (0.3169+0.0775-0.0756+0.1825+.01618)/5
= 0.13262 = 13.26%

Contrast = 13.26% - 12.5% = 0.76% or 76 basis points

Summary

  • Geometric mean always less or equal to arithmetic mean
  • Difference between arithmetic mean and geometric means increases with variability in the period by period observations
  • SLASX has higher means than PRFDX but difference is lower
  • -> this means that a dollar invested in SLASX will compound to $1.9990 and a dollar invested in PRFDX will compound to $1.125
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Practice Qns for geometric mean

A

Investment in single stock initially cost $100. One year later, stock is trading at $200. At end of year 2, stock price falls back to original. Calculate GM and AM.

Arithmetic mean
Return in Y1 = (200-100)/100 = 100%
Return in Y2 = (100-200)/200 = -50%

(100-50)/2 = 25%

Geometric mean
2√(1+1 (which is 100%) * (1+(-0.5) - 1
2√(2* 0.5) - 1
= 1 -1 = 0

GM = 0 reflects the ending value of investment in Y2 = starting value in Y1 
AM = reflects average of the 1 year returns
22
Q

The use of quantitative measures that explain characteristics of data

A

Harmonic Mean

  • obtained by summing the reciprocals of the observations then averaging the sum by dividing it by the number of observations and then taking the reciprocal
    eg. investor purchase $1000 of a security each month for n=2months. Share prices are $10 and $15 at 2 purchase dates. What is the average price paid for the security?

1st month = 1000/10 = 100 shares
2nd month = 1000/15 = 66.67 shares
Total = 166.67 shares
Average paid = 2000/166.67 = $12 per share

Formula = 2/((1/10)+(1/15)) = 12

Unless harmonic, geometric and arithmetic means is that unless all the observations in a data set have the same value, the harmonic mean is less than geometric mean, which is less than arithmetic mean

23
Q

Practice Qns for harmonic mean

A

Manager invests 5000 annually in security for 4 years. What is the average price of the security?

Y1 = $62/share
Y2 = $76/share
Y3 = $84/share
Y4 = $90/share 
Solution 
Y1 = $62/share = 5000/62 = 80.65 shares 
Y2 = $76/share = 5000/76 = 65.79 shares 
Y3 = $84/share = 5000/84 = 59.52 shares 
Y4 = $90/share = 5000/90 = 55.56 shares 

Total shares = 261.52 shares
Total amount invested = 5000 x 4 = 20,000

Average share price = 20,000 / 261.52 = $76.48

24
Q

Quantiles

A
  • Value at or below which a stated faction of the data lies, aka fractile
  • -> to establish 25%, 50% or 75% of the annual returns on a portfolio are at or blow the values -0.05, 0.16 and 0.25 respectively to provide concise information about the distribution of portfolio returns
  • orders need to be sort in ascending order
  1. Quartiles divide distribution into quarters
  2. Quintiles divide distribution into fifths
  3. Deciles divide distribution into tenths
  4. Percentiles divide distribution into hundredths

–> given a set of observations, Yth percentile is the value at or below which y percent of observations lie

Ly = (n+1) y/100
where L is the location
Y is the percentage point at which we are dividing the distribution
n is the number of entries

With big sample size, percentile location calculation becomes more accurate

Summary
- Where L is whole number and we are determining the 3rd quartile with 15 data then L = (15+1)(75/100) = 12 then the third quartile P75 = X12 (the position in the data)

  • Where L is not an integer, L lies between 2 closet integer numbers (one above and one below) we use linear interpolation between those 2 numbers to determine Py
  • -> linear interpolation = estimation of an unknown value of the basis of 2 known values, using a straight line between the 2 known values

EG. If Ly = 9.60, the next lower whole number is 9 and the next higher whole number is 10. Using linear interpolation, P60 = X9 + (9.60 - 9) (X10 - X9)

9th position = 6.81
10th positions = 7.20

Thus estimate = 6.81 + (0.6)(7.20-6.81)
= 7.04 percent

25
Q

Using quantiles in investments

A

The study of Ibbotson et al which propose an investment style based on liquidity - buying stock of less liquidity and selling stocks of more liquid stocks

26
Q

Measures of dispersion

A

Dispersion is the variability around the central tendency

  • mean returns = rewards
  • dispersion = risks

4 most common measures of dispersion (absolute dispersion)

  1. Range
  2. Mean absolute deviation
  3. variance (commonly used)
  4. standard deviation (commonly used)

Absolute dispersion = amount of variability present without comparison to any reference point or benchmark

27
Q

Range

A

The difference between the maximum and minimum value in a data set

range = max. value - min. value

Eg max return = 42.56
Min. return = -29.73
Range = 42.56 + 29.73 = 72.29

Advantage
- ease of computation

Disadvantage

  • only uses 2 pieces of information from distribution
  • cannot tell how data are distributed
  • reflect large or small outcomes that are not representative
28
Q

Mean absolute deviation

A
  • ignore signs of deviations around the mean
Formula = sum of all (Xi - X)/n
X = sample mean and n is number of observations 

If Xi = -11.0 and X = 4.5
Absolute value of difference is -11.0-4.5 = -15.5 = 15.5

Advantage
- uses all observations in sample thus more superior than range

Disadvantage
- difficult to manipulate compared to variance

29
Q

Practice Qns for range and mean absolute deviation

A
Y1 - SLASX 34.90%, PRFDX 31.69% 
Y2 - SLASX 6.13%, PRFDX 7.75% 
Y3 - SLASX 2.69%, PRFDX -7.56% 
Y4 - SLASX 11.66%, PRFDX 18.25% 
Y5 - SLASX 21.77%, PRFDX 16.18% 
  1. Calculate range for each fund and which is riskier?
  2. Calculate MAD for each fund and which is riskier?

Solution
Range = max - min
SLASX = 34.9 - 2.69 = 32.21%
PRFDX = 31.69 + 7.56 = 39.25% (riskier)

MAD = sum of all (Xi - X) / n
AM for SLASX = 15.43% (calculated above in line 20)
MAD = (34.9 - 15.43) + (6.13 - 15.43) + (2.69 - 15.43) + (11.66 - 15.43) + (21.77 - 15.43) / 5
= 19.47 + 9.3 + 12.74 + 3.77 + 6.34 / 5 = 10.32%

AM for PRFDX = 13.26% (calculated above in line 20)
MAD = (31.69 - 13.26) + (7.75 - 13.26) + (-7.56 - 13.26) + (18.25 - 13.26) + (16.18 - 13.26) / 5
= 18.43 + 5.51 + 20.82 + 4.99 + 2.92 / 5 = 10.53% (riskier)

  • RMB THAT NEGATIVE SIGN DON’T MATTER FOR MAD *
30
Q

Variance and standard deviation

A

Variance
- as the average of the squared deviations around the mean

Standard deviation
- positive square root of the variance

  • Both measures asset’s risks
  • variances and sd of returns take account of returns above and below the mean
31
Q

Population variance

A

EG. 3 coys with profit as percentage of revenue was 0.0, 2.1 and 2.0. We find that the mean is 1.37 in (11).

Population variance
= (1/3)((0-1.37)^2 + (2.1 - 1.37)^2 + (2 - 1.37)^2)
= (1/3)(1.88+0.53+0.40)
= (1/3)(2.81) = 0.94

32
Q

Population standard deviation

A

Positive square root of the population variance

Eg. Based on (31), if variance is 0.94, then SD = √0.94
= 0.97

33
Q

Practice Qns for population mean, population variance and population standard deviation

A

EG. 10 different stocks with 10 P/E

  1. 10.7
  2. 24.9
  3. 17.3
  4. 13
  5. 24.6
  6. 9.7
  7. 16.7
  8. 50.6
  9. 18.1
  10. 6.5
  11. What is the population mean?
  12. What is the population variance and population standard deviation?

Solution
Population mean = sum of all observations/10
= 192.1/10 = 19.21

  1. Population variance
    Sum of (10.7 - 19.21)^2 + (24.9 - 19.21)^ + (17.3 - 19.21)^ +……. / 10
    = 1420.9/10
    = 142.09
  2. Population Standard deviation
    = √142.09
    = 11.92
ALTERNATIVELY USING CALCULATOR TO FIND SD, POPULATION MEAN (X) 
- 2nd 7
- type x1 = first value and press enter
- enter for all 10 values 
- 2nd 8 to view stats 
- σX = standard deviation 
- Ẋ = population mean 
(when you have SD, just ^2 to find variance)
34
Q

Sample variance and sample standard deviation

A

Sample variance = statistic that measures dispersion in a sample
- difference in population variance formula is that instead of /n it is /n-1

Sample standard deviation is just √variance

35
Q

Practice Qns for sample variance and sample standard deviation

A
Y1 - SLASX 34.90%, PRFDX 31.69% 
Y2 - SLASX 6.13%, PRFDX 7.75% 
Y3 - SLASX 2.69%, PRFDX -7.56% 
Y4 - SLASX 11.66%, PRFDX 18.25% 
Y5 - SLASX 21.77%, PRFDX 16.18% 
  1. Calculate sample variance for both funds and sample standard deviation for both funds
Solution
Using calculator 
2nd 7
Key in all observations 
2nd 8 
Find Sx = sample standard deviation 

For SLASX = 13.1
For PRFDX = 14.5

To find variance, square SD
For SLASX = 170.57
For PRFDX = 209.23

Note that MAD is always less than or equal to SD

36
Q

Semivariance, semideviation and related concepts

A

While variance and SD of returns take into account of returns above and below mean but investors are concerned only with downside risk (ie returns below mean)

  • therefore the use of semivariance and Semideviation

Semivariance

  • average squared deviation below the mean
  • steps taken to compute
    1. calculate sample mean
    2. identify observations small than or equal to mean
    3. compute sum of square negative deviations from mean
    4. divide sum of squared negative deviations from step 3 by total sample size - 1 (n-1)

Semideviation
- postive square root of semivariance

37
Q

Example of Semivariance and semideviation

A

5 shares return

  1. 34.90
  2. 6.13
  3. 2.69
  4. 11.66
  5. 21.77
  6. Sample mean = 15.43
  7. Identify observations below mean: 6.13, 2.69, 11.66
  8. Semivariance = 65.75
  9. Semideviation = 8.1%

SD = 13.1, Semideviation = 8.1
As semideviation is less than standard deviation, means that SD overstates risk

38
Q

Target semivariance and target semideviation

A

Target semivariance = average squared deviation below a stated target

target semideviation = positive sq root

Step 1: Identify target
Step 2: Identify observations below target
Step 3: compute sum of square negative deviations from target
4. divide sum of squared negative deviations from step 3 by total observation - 1 (n-1)

based on example in 37,
if target return is 11.5, 2 observations are below target (2.69 and 6.13)

Target semivariance
= ((2.69 - 11.5)^2 + (6.13 - 11.5)^2)/(5-1) = 26.61
and Target semideviation
= √26.61 = 5.2%

39
Q

Chebyshev’s inequality

A

For any distribution with finite variance, the proportion of the observations with k standard deviations of the arithmetic mean is at least 1 - 1/k^2 for all k > 1

Eg.
k interval around sample mean proportion
1.25 Ẋ ± 1.25s 36
1.5 Ẋ ± 1.50s 56
2 Ẋ ± 2s 75
2.5 Ẋ ± 2.5s 84
3 Ẋ ± 3s 89
4 Ẋ ± 4s 94

when k = 1.25, the inequality states that the min. proportion of the observations that lie within ± 1.25s is
1-1/(1.25)^2 = 1 - 0.64 = 0.36 or 36%

  • a 2SD interval round the mean must contain at least 75% of the observations and a 3SD interval around the mean must contain at least 89% of the observations
40
Q

Practice Qns for Chebyshev’s inequality

A

AM monthly return and SD of monthly returns = 0.95 percent and 5.39 percent respectively, totalling 1104 observations.

  1. Calculate end points of the interval that must contain at least 75% of month returns according to Chebyshev’s inequality.

Ẋ ± 2s
0.95 ± 2(5.39) = 0.95 ± 10.78
Lower end point = 0.95 - 10.78 = -9.83%
Higher end point = 0.95 + 10.78 = 11.73%

  1. What are the min number of observations that must lie in the interval computed in 1?

if 75% needs to lie in the interval then 0.75(1104) = 828 observations at min.

41
Q

Practice Qns for Chebyshev’s inequality

A

N = 240 months
mean monthly return = 0.79%
SD monthly return = 1.16%

What is the min. number of the 240 monthly returns that fall into range of -0.95% to 2.53%?

Solution
Since 1 - 1/k^2 for all k > 1

Upper limit of range = 2.53% which is 2.53 - 0.79 = 1.74% above mean. The lower limit is -0.95 which is 0.79 -(-0.95) = 1.74% below mean.
As a result, k = 1.74/1.16 = 1.5SD

Because k = 1.5, proportion of observations within the interval is at least 1-1.5^2 = 55.6%

0.556 x 240 = 133 observations

42
Q

Practice Qns for Chebyshev’s inequality

A

For a distribution of 2000 observations with finite variance, sample mean of 10% and SD of 4%, what is the min. number of observations that lie within 8% around the mean?

Solution
To find how many observations will lie within 8% of mean, we need to find how many SD intervals first.

So, 8% of mean/ 4% of SD = 2SD

Based on Chebyshev’s theory, 1 - 1/k^2 = P
So if K = 2, P = 1-1/2^2
P = 1-1/4 = 75%

So 75% of 2000 observations = 1500 observations

43
Q

Practice Qns for Chebyshev’s inequality

A

A sample of 438 observations. Mean sample is 382 and SD is 14. The end points of interval that must container at least 88.89% of the observations are closest to?

Solution 
Since 88.89% = k = 3 (see table on 39) 
Therefore, we have 382 ±  3(14) 
Lower end points = 382 - 42 = 340
Higher end points = 382 + 42 = 424
How not to memorise SD from table?
Rmb formula P = 1-1/k^2 
So if P = 88.89
Then 88.89 = 1/1/k^2 
0.8899 - 1 = 1/k^2 
0.11 = 1/k^2 
using cross multiple 
0.11k^2 = 1
k^2 = 1/0.11
k = 3
44
Q

Coefficient of variation

A

Relative dispersion is the amount of dispersion relative to a reference value or benchmark

Coefficient of variation CV = ratio of the standard deviation of a set of observations to their mean value

CV = S/Ẋ where s is sample SD and X = sample mean

When the observations are returns, the CV measures the amount of risk(SD) per unit of mean return

Eg. 2 sets of data
Coy A - SD = 16.8, Mean = 70, CV = 0.24
Coy B - SD = 16.8, Mean = 820, CV = 0.02

This means that first sample as greater variability in sales

45
Q

Practice Qns for CV

A

4 major stock indexes with its AM and SD

Australia - 5.3 (AM), 11.9 (SD)
HK - 5.6 (AM), 15.8 (SD)
Japan - 15.7 (AM), 16.3 (SD)
South Korea - 4.2 (AM), 8.6(SD)

  1. What is the CV for each market?
  2. Rank markets from riskier to least risky using CV as measure
Solution 
1. 
Aust = 11.9/5.3 = 2.25
HK = 15.8/5.6 = 2.82
Japan = 16.3/15.7 = 1.04
SK = 8.6/4.2 = 2.05
  1. HK, Aust, SK, Japan
46
Q

Symmetry in return distributions

A

Mean and variance may not adequately describe an investment’s distribution of returns therefore we need to analyse the degree of symmetry in return distributions

  • if return distribution is symmetrical about its mean, then each side of the distribution is a mirror image = loss and gains interval exhibits same frequency
  • normal distribution is a symmetrical bell shaped distribution and characteristics include:
    1. mean and median is equal
    2. completely described by 2 parameters; mean and variance
    3. roughly 68% of its observations lie between +/- 1SD from the mean, 95% lie between +/- 2SD from the mean and 99% lie between +/- 3SD
47
Q

Skewness in return distributions (important)

A
  • a distribution that is not symmetrical is called skewed
  • return distribution with a positive skew has frequent smaller losses and few extreme gains, long tail on the right side
  • return distribution that has negative skew has frequent small gains and few extreme losses, long tail on the left side
  • for the continuously positively skewed unimodal distribution, the mode is less than the median, which is less than the mean
  • for the continuously negatively skewed unimodal distribution, the mean is less than the median, which is less than the mode

Investors should be attracted by the positively skew because mean return falls above the median

  • skewness is computed as the average cubed deviation from the mean standardised by dividing the SD cubed to make the measure free of scale
  • symmetry distribution has skewness = 0, positive skewed distribution = positive skewness, negative skewed distribution = negative skewness
  • if a distribution is positively skewed with a mean greater than its median, then more than half of the deviations from the mean are negative and less than half are positive.
  • if skewness is positive, average magnitude of positive deviations is larger than the average magnitude of negative deviations

sample skewness formula
SK = (n/(n-1)(n-2))*Sum of (x -Ẋ)^3/ s^3

48
Q

Practice Qns for Skewness

A
10 annual rates of returns 
Y1 = -35.75
y2 = 25.62
y3 = 15.15
y4 = -0.72
y5 = 17.25
y6 = 31.69
y7 = 7.75
y8 = -7.76
y9 = 18.25
y10 = 16.18 
  1. Calculate the skewness showing 2 decimal places

Solution
Using calculator to find what is the sample mean and sample SD

Sample mean = 8.77
Sample SD = 19.474

Using the skewness formula
n/(n-1)(n-2) = 10/(9)(8) = 0.1389
SD^3 = 7380.705

y1 = (-35.75-8.77^ 3 = -88239.99
y2 = (25.62-8.77)^3 = 4784.094
y3……. continue till y10 and find the summation
Summation = -74659.487
Sum/s^3 = -10.1155
Skewness = 0.1389 x -10.1155 = -1.405 (negatively skewed)

49
Q

Kurtosis in return distributions

A

Kurtosis is a measure of the combined weight of the tails of a distribution relative to the rest of the distribution; the proportion of the total probability that is in the tails

Distribution that has fatter tails than normal distribution is called Leptokurtic

  • tends to generate more frequent extremely large deviations from the mean than the normal distribution
  • have same mean, SD and skewness with normal distribution
  • more peaked than normal

Distribution that has thinner tails than normal distribution is called Platykurtic
- less peaked than normal

Distribution identical to the normal distribution as concerns relative weight in the tails is called Mesokurtic

  • for normal distribution, kurtosis = 3
  • excess kurtosis = kurtosis - 3
  • normal kurtosis or mesokurtic as excess = 0
  • leptokurtic distribution has excess greater than 0
  • platykurtic distribution has excess less than 0

Sample excess kurtosis formula

KE = [((n(n+1)/(n-1)(n-2)(n-3)) * Sum of (x -Ẋ)^4/ s^4)] -
(3(n-1)^2)/(n-2)(n-3))

50
Q

Practice Qns of Kurtosis

A
10 annual rates of returns 
Y1 = -35.75
y2 = 25.62
y3 = 15.15
y4 = -0.72
y5 = 17.25
y6 = 31.69
y7 = 7.75
y8 = -7.76
y9 = 18.25
y10 = 16.18 

Calculate the sample excess kurtosis showing 2 decimal places.

Solution

Using calculator
Sample mean = 8.77
Sample SD = 19.474
SD^4 = 143702.329

Using formula
KE = [((n(n+1)/(n-1)(n-2)(n-3)) * Sum of (x -Ẋ)^4/ s^4)] -
(3(n-1)^2)/(n-2)(n-3))

[((10(10+1)/(10-1)(10-2)(10-3))
= 110/(987) = 0.2183

(3(10-1)^2)/(10-2)(10-3)) = excess kurtosis 
243/(8*7) = 4.34
Y1 = (-35.75-8.77)^4 = 3928444.507
Y2 = (25.62-8.77)^4 = 80611.986
Y3...... continue to y10
Summation = 4385716.355
sum/ s^4 = 30.519

Kurtosis = 0.2183 * 30.519 = 6.661

Excess kurtosis = 6.6611 - 4.34 = 2.32

51
Q

Using geometric and arithmetic means when analysing investment returns

A
  • geometric means = study past performance
  • arithmetic means = appropriate for forward looking context (always greater than or equal to geometric mean)
  • semilogarithmic: scale constructed so that equal intervals on the vertical scale represent equal rates of change, equal intervals on the horizontal scale represent equal amounts of change
  • more appropriate when graphing past performance
  • arithmetic scale on horizontal axis and logarithmic scales on vertical axis for the value of investment
  • a plot curving upward reflects increasing growth rates over time
  • uncertainty in cash flows or returns causes the arithmetic mean to be larger than geometric mean
  • geometric mean return approximately equals the arithmetic return minus half the variance of return
  • when zero variance, geometric mean = arithmetic mean