Quantitative Methods - Summary Data Flashcards
What is the purpose of location measures to an investment manager?
Indicate the central/long run average value/return achieved
What does the dispersion measure indicate to the investment manager?
The variability or spread of values around this point
What are the three methods of presenting data?
Raw data
Tabulated data
Grouped data
What is raw data?
Data presented as a simple list of figures or values that may or may not be ordered
For what is tabulating data good for?
Summarising lagged volumes of discrete data
What is grouped data often used for?
To summarise continuous data
What are the four location measure?
Arithmetic mean
Median
Mode
Geometric mean
What is the arithmetic mean?
Simple average calculated by adding up the observed values and dividing by the number of observed values
How do you calculate the arithmetic mean fro tabulated data?
If the same number appears multiple times, you multiply it but the number of time it appears
What is a weighted average useful for?
Determining the return on a portfolio of securities
How do you calculate the wedged average?
Work out what proportion each return represents then multiple the return by the proportion. Add the results of these together
What must be done the calculation of the arithmetic mean for grouped data?
One must make the assumption that values are evenly spread within each range so we can, therefore, evaluate each range based on its midpoint
What is a problem with the arithmetic mean?
Since all observed values are considered, this measures can be severely distorted by extreme values
What is the median?
The value of the mid-item in an ordered arrangement of the observed data
What occurs if there are an even number of items in the calculation of the median?
An average would be taken from the value above and the value below
How do you work out the median for grouped data?
Assuming the items within the range are evenly spread the median is the number into the band divided by the frequency of the band
What is the general expression for working out the median of grouped data?
Median = start value of band + fraction of distance through band * width of band
What is the formula for the median of grouped data?
Median = start value of band + (median number - cumulative number to start the band/ number of items in band * width of band)
What is different about the median compared to the mean?
It will not be affected by the size of extreme values, however it will be affected by the addition of extreme items since the total number of observed values will change
What is the mode?
The most frequently occurring item in the observed data on the basis that the more central items should occur most frequently in a Normal distribution
What happens when determining the mode for grouped data?
One cannot establish a single value within this range since we assume that the items are evenly distributed throughout the range, hence all are equally likely. This means we have a modal range over 5-9.9 rather than a single figure
What are some key different between the mode and the median and mean?
The mode must be an actually occurring number.
There may be several modes meaning the mode may be of limited value as a central measure since, in extreme cases, it gives little or no indication of central tendency
Since it is the most frequently occurring item, it is unaffected by extreme values
How is the geometric mean calculated?
It is calculated by taking the nth route of the product of the n observed values
When may geometric mean be most appropriate?
When considering growth or inflation which compounds each year
What is the compound factor?
The general term used to described when one compounds up the values each period
What are the differences between the arithmetic and geometric mean?
Geometric mean will understate average growth/returns etc compared to the arithmetic mean
Geometric mean will be zero if there is an observed value of zero as it works by multiplying them together. Arithmetic mean, so long as there are others numbers will still work
What is perfectly symmetrical distribution?
In a perfectly symmetrical distribution, with the mean being the most commonly occurring item, the mean, median and mode would all be the same value as the the mean would be the median and the mean would be the most frequently occurring number
What would happen to the three averages if one extremely high value was added to perfectly symmetrical distribution?
The mode would be unaffected
The median would be slightly higher
The mean could be significantly higher
What is positively skewed distribution?
Where the more extreme items lie above the mode
What is negatively skewed distribution?
Where the more extreme items fall below the mode
What is the purpose of dispersion measures?
To tell us how broadly spread a range of values around the observed central point
What are the related dispersion measures for the various location measures?
Arithmetic mean = standard deviation, variance
Median = range, deciles, interquartile range, percentiles
Mode = n/a
Geometric mean = mean
What is standard deviation?
A dispersion measure that is related to the arithmetic mean.
What is the purpose of standard deviation?
To establish how far each observed value falls from the mean, the standard deviation being a function of this divergence, and the. Variance being the square of the standard deviation. The greater the divergence of the observed values from the mean, the greater the standard deviation.
What is the outline for the calculation of standard deviation?
- Calculate the arithmetic mean
- Calculate the difference between each observed value and the arithmetic mean, the sum of which must be zero.
- Square the differences to remove the negative signs from those below the mean
- Add these squared numbers together
- Calculate the average by dividing by the number of observed values
- Take the square root of this average
It what units will standard deviation be in?
The same as that of the mean
What is the variance?
The square of the standard diviation
When calculating standard deviation for tabulated data what must be done?
One must take account of the frequency of each observed value within each band, and hence the frequency of each observed difference, by multiplying the square of the difference for the band by the number of items in that band
What is important when calculating the standard deviation of a sample?
The calculation used for the population is unlikely to give a realistic measure since it may not consider a representative sample of variations. The smaller the sample, the less likely it is to contain the true extremes that may exist within any population. Standard deviation can’t be calculated if there is only one sample
What is the Bessels approximation?
The method used when calculating standard deviation of a sample. It divides down by n-1 rather than just n
What does the Bessels approximation ensure?
Standard deviation can’t be calculated when the sample size is one.
Calculated standard deviations are slightly enlarged, since division by n-1 produces a larger value than just n
As the size of the sample increases towards the size of the full population, deducting one from the denominator will have an increasingly insignificant effect until the standard deviation is approximately identical to that calculated using the above equations for the full populations.
In a gap fill questions and multiple choice questions, if I am unsure if the question involves a population or a sample, how should I calculate it?
As a sample
The range and interquartile range are measures of dispersion most frequently associated with?
The median
What is the range?
The distance between the highest and lowest observed value.
Expressed as
Range = Highest observed value -lowest observed value
What is important about the range?
It is completely dependent on the two most extreme values and take no account of the frequency of occurrence of any items or the values of any of the other items. As a result it may be of little use in determining most likely variations
When calculating the range for grouped data what must be done?
Assume the values are spread evenly meaning the lowest value in any range will mark the bottom and the opposite with the highest value.
What does the interquartile range do?
Tries to give a measure of spread that is more representative of the observed values.
It does this by measuring the range over the central most 50% of the population and should, therefore, be a more representative since extreme values are excluded?
How is the interquartile range calculated?
The values are placed in ascending order and divided into four quarters (quantiles) each containing the same number of items, and measuring the difference between the top of the first and top of the third.
In a an interquartile range, where is the median?
As it is the mid item in an ordered list, it marks the top of the second quartile
What are the formulas for the interquartile range?
Item marking the top of quartile one - N1 = 1/4 (n+1)
Item marking the top of quartile two (median) - N2 = 2/4 (n+1)
Item marking the top of quartile range three - N3 = 3/4 (n+1)
n is the number of observed values
When calculating the interquartile range, if one of figures is a half number what must be done?
An average will be taken of the below and above number
How are percentiles calculated?
In the same fashion as the interquartile range. The fraction must merely be adjusted to give the % you want
What is the particularly important type of probability distribution applicable to continuous data and why is it important?
Normal distribution. It is applicable to many continuous variables observed in nature such as height and weight but is equally important in finance and investment as many investment management theories are based on the assumption that security returns are -
Normally distributed
Independent through time - i.e the return in any period is completely unconnected with that of another period
What is the correlation of the returns from one period to the next called?
Autocorrelation
What is normal distribution in the context of finance?
Symmetrical distribution of say, possible security returns that is uniquely defined by a mean and a standard deviation. These statistics may be assessed through technical analysis of past price movements from which, as we saw earlier, we can calculate
An expected return - mean
A measure of risk - the standard deviation
What are the important characteristics or normal distribution in the context of security and portfolio returns?
- The distribution of returns is symmetrical
- The possible returns follow a bell-shaped distribution where more central values are most likely to be observed and the more extreme the movement the less likely it is to occur
- The distribution has a single standard deviation across the entire range of possible values and through time.
- The total area under the distribution is 1 or 100%, the essential probability distribution characteristic.
What is a hypothesis test?
Taking a sample of measurements to see if they are consistent or inconsistent with the assumed value.
Testing the hypothesis that the value of a given variable could be a certain figure or whether there is evidence to show it is not.
What is the significance test?
A variation of the hypothesis test, where we are using a sample to test the hypothesis that the value of a given variable could be zero or whether there is evidence to show that it is not, ie it has a significant non-zero value. If the observed value is not significant then it may simply have occurred by chance.
To what is the idea underlying hypothesis and significance testing related?
Normal distribution