Sampling And Estimation Flashcards
Sampling plan
The set of rules used to select a sample.
Parameter
A descriptive measure computed from or used to describe a population of data, conventionally represented by Greek letters.
Simple random sample
A subset of a larger population created in such a way that each element of the population has an equal probability of being selected to the subset.
Systematic sampling
A procedure of selecting every kth member until reaching a sample of the desired size. The sample that results from this procedure should be approximately random.
Sampling error
The difference between the observed value of a statistic and the quantity it is intended to estimate.
Stratified random sampling
In stratified random sampling, the population is divided into subpopulations (strata) based on one or more classification criteria. Simple random samples are then drawn from each stratum in sizes proportional to the relative size of each stratum in the population. These samples are then pooled to form a stratified random sample.
Indexing
An investment strategy in which an investor constructs a portfolio to mirror the performance of a specified index.
Monetary policy
Actions taken by a nation’s central bank to affect aggregate output and prices through changes in bank reserves, reserve requirements, or its target interest rate.
Sharpe ratio
The average return in excess of the risk-free rate divided by the standard deviation of return; a measure of the average excess return earned per unit of standard deviation of return.
Central limit theorem
Given a population described by any probability distribution having mean μ and finite variance σ2, the sampling distribution of the sample mean X ( x bar*) computed from samples of size n from this population will be approximately normal with mean μ (the population mean) and variance σ2/n (the population variance divided by n) when the sample size n is large.
Standard error of the sample mean
For sample mean X⎯⎯⎯ calculated from a sample generated by a population with standard deviation σ, the standard error of the sample mean is given by one of two expressions:
Equation (1)
σX⎯⎯⎯=σ / √n
when we know σ, the population standard deviation, or by
Equation (2)
sX⎯⎯⎯= s /√n
when we do not know the population standard deviation and need to use the sample standard deviation, s, to estimate it.6
In practice, we almost always need to use Equation 2. The estimate of s is given by the square root of the sample variance, s2, calculated as follows:
Equation (3)
2 2
s =∑(Xi−X⎯⎯⎯) / n−1
Properties of the distribution of the sample mean
The distribution of the sample mean X⎯⎯⎯ will be approximately normal.
The mean of the distribution of X⎯⎯⎯ will be equal to the mean of the population from which the samples are drawn.
The variance of the distribution of X⎯⎯⎯ will be equal to the variance of the population divided by the sample size.
Estimator
An estimation formula; the formula used to compute the sample mean and other sample statistics are examples of estimators.
Point estimate
A single numerical estimate of an unknown quantity, such as a population parameter.
Unbiased estimator
An unbiased estimator is one whose expected value (the mean of its sampling distribution) equals the parameter it is intended to estimate.
Efficiency of an unbiased estimator
An unbiased estimator is efficient if no other unbiased estimator of the same parameter has a sampling distribution with smaller variance.
Consistency of an estimator
A consistent estimator is one for which the probability of estimates close to the value of the population parameter increases as sample size increases.
Confidence interval
Definition of Confidence Interval. A confidence interval is a range for which one can assert with a given probability 1 − α, called the degree of confidence, that it will contain the parameter it is intended to estimate. This interval is often referred to as the 100(1 − α)% confidence interval for the parameter.
Construction of confidence intervals
A 100(1 − α)% confidence interval for a parameter has the following structure.
Point estimate ± Reliability factor × Standard error
where
Point estimate = a point estimate of the parameter (a value of a sample statistic)
Reliability factor = a number based on the assumed distribution of the point estimate and the degree of confidence (1 − α) for the confidence interval
Standard error = the standard error of the sample statistic providing the point estimate13
Confidence Intervals for the Population Mean (Normally Distributed Population with Known Variance)
A 100(1 − α)% confidence interval for population mean μ when we are sampling from a normal distribution with known variance σ2 is given by
X⎯ ± z(α/2) σ/√n
Reliability Factors for Confidence Intervals Based on the Standard Normal Distribution
We use the following reliability factors when we construct confidence intervals based on the standard normal distribution:
90 percent confidence intervals: Use z0.05 = 1.65
95 percent confidence intervals: Use z0.025 = 1.96
99 percent confidence intervals: Use z0.005 = 2.58
Confidence Intervals for the Population Mean—The z-Alternative (Large Sample, Population Variance Unknown)
A 100(1 − α)% confidence interval for population mean μ when sampling from any distribution with unknown variance and when sample size is large is given by
X⎯ ± zα/2 s/√n
Degrees of freedom (df)
The number of independent observations used.
Confidence Intervals for the Population Mean (Population Variance Unknown)—t-Distribution.
If we are sampling from a population with unknown variance and either of the conditions below holds:
the sample is large, or
the sample is small but the population is normally distributed, or approximately normally distributed,
then a 100(1 − α)% confidence interval for the population mean μ is given by
X⎯±tα/2 s/√n
where the number of degrees of freedom for tα/2 is n − 1 and n is the sample size.
Data mining
The practice of determining a model by extensive searching through a dataset for statistically significant patterns. Also called data snooping.
Out of sample test
The practice of determining a model by extensive searching through a dataset for statistically significant patterns. Also called data snooping.
Intergenerational data mining
A form of data mining that applies information developed by previous researchers using a dataset to guide current research using the same or a related dataset.
Sample selection bias
Bias introduced by systematically excluding some members of the population according to a particular attribute—for example, the bias introduced when data availability leads to certain observations being excluded from the analysis.
Survivorship bias
The bias resulting from a test design that fails to account for companies that have gone bankrupt, merged, or are otherwise no longer reported in a database.
Look ahead bias
A bias caused by using information that was unavailable on the test date.
Time period bias
The possibility that when we use a time-series sample, our statistical conclusion may be sensitive to the starting and ending dates of the sample.