Part 5. Sampling & Estimation Flashcards
Simple Random Sampling
A method of selecting a sample in such a way that each item or person in the population being studied has the same likelihood of being included in the sample.
e.g. picking random numbers out of a bag.
Systematic sampling
Another way to form an approximately random sample, by selecting every nth member from a population.
Sampling error
The difference between a sample statistic (the mean, variance, or standard deviation of the sample) and its corresponding population parameter (the true mean, variance or standard deviation of the population).
sampling error of the mean = sample mean (x-) - population mean (u)
Sampling distribution
(Of a sample statistic)
A probability distribution of all possible sample statistics computed from a set of equal-size samples that were randomly drawn from the same population.
Sampling distribution of the mean
Suppose a random sample of 100 bonds is selected from the population of a major municipal bond index consisting of 1000 bonds, and then the mean return of 100-bond sample is calculated.
Repeating this process many times will result in many different estimates of the population mean return.
Stratified random sampling
Uses a classification system to separate the population into smaller groups based on one or more distinguishing characteristics.
From each subgroup (stratum), a random sample is taken and the results are pooled, the size of the samples from each subgroup (stratum) is based on its size relative to the population.
Stratified Sampling Example
Used often in bond indexing, due to the difficulty and cost of completely replicating the entire population of bonds.
The bonds in a population are categorised (stratified) according to major bond risk factors such as duration, maturity, coupon rate, and the like.
The samples are drawn from each separate category and combined to form a final sample.
Time series data
This consists of observations taken over a period of time at specific and equally spaced time intervals.
e.g. the set of monthly returns on Microsoft stock from January 1994 to January 2004.
Cross-sectional data
A sample of observations taken at a single point in time.
e.g. the sample of reported earnings per share of all Nasdaq companies as of Dec 31, 2004.
Longitudinal data
Observations over time of multiple characteristics of the same entity, such as unemployment, inflation and GDP growth rates for a country over 10 years.
Panel data
This contains observations over time of the same characteristic for multiple entities, such as debt/equity ratios for 20 companies over the most recent 24 quarters.
Central Limit Theorem
For simple random samples of size n from a population with mean (u) and finite variance (sigma^2).
The sampling distribution of the sample mean (x-) approaches a normal probability distribution with mean (u), and a variance equal to (sigma^2/n) as the sample size becomes large.
Useful as the normal distribution is relatively easy to apply to hypothesis testing, and construction of confidence intervals.
Inferences about population mean can be made from sample mean, regardless of the populations distribution, as long as sample size is “sufficiently large”, usually mean n>/30.
Important properties of central limit theorem:
- If the sample size n is sufficiently large (n>/30), the sampling distribution of the sample means will be approx. normal.
- So random samples of size n are repeatedly being taken from overall larger population, with each random sample having its own mean itself being a random variable, and this set sample means has a distribution that is approx. normal. - The mean of the population (u), and the mean of the distribution of all possible sample means are equal.
- The variance of the distribution of sample means is sigma^2/n, the population variance divided by sample size.
Standard deviation of the means of multiple samples:
This is less than the standard deviation of single observations.
If standard deviation of monthly stock returns is 2%, the standard error (deviation) of the average monthly return over the next six months is 2%/root6 = 0.82%.
The average of several observations of random variable will be less widely dispersed (lower standard dev) around the expected value than will a single observation of the random variable.
Desirable properties of an estimator:
- Unbiasedness
- Efficiency
- Consistency