(10) Sampling and Estimation Flashcards
LOS 11. a: Define simple random sampling and a sampling distribution.
Simple random sampling is a method of selecting a sample in a way that each item or person in the population being studied has the same probability of being included in the sample. Each number is chosen using either of the following methods: random number generator or selecting every kth element
LOS 11. a: Define simple random sampling and a sampling distribution.
A sampling distribution is the distribution of all values that a sample statistic can take on when computed from samples of identical size randomly drawn from the same population.
LOS 11. b: Explain sampling error.
Sampling error is the difference between a sample statistic and its corresponding population parameter (e.g., the sample mean minus the population mean).
LOS 11. c: Distinguish between simple random and stratified random sampling.
Also, what are the steps to create a stratified random sampling?
Stratified random sampling involves randomly selecting samples proportionally from subgroups that are formed based on one or more distinguishing characteristics, so that the sample will have the same distribution of these characteristics as the overall population.
Stratified random sampling reduces sampling error
Step 1: population is divided into sub-populations
Step 2: Simple random samples are dranw from each strata in proportion to their size
LOS 11. d: Distinguish between time-series and cross-sectional data.
Time-series data consists of observations taken at specific and equally spaced points in time. This is only for one observational unit
Ex of time series: ABC daily stock prices
Cross-sectional data consists of observations taken at a single point in time. This includes many observational units.
Ex of cross-sectional: Free cash flow/ debt ratio for U.S Industrials
LOS 11. e: Explain the central limit theorem and its importance.
The central limit theorem states that for a population with a mean µ and a finite variance σ2, the sampling distribution of the sample mean for all possible samples of size n (for n >= 30) will be approximately normally distributed with a mean equal to µ and a variance equal to σ2/n.
LOS 11. f: Calculate and interpret the standard error of the sample mean.
The standard error of the sample mean is the standard deviation of the distribution of the sample means and is calculated as:
σXbar = s/(n1/2), where σ, the population standard deviation, is known
sx = s/(n1/2), where s, the sample standard deviation, is used because the population standard deviation is unknown.
As n increases, SE will decrease
LOS 11. g: Identify and describe desirable properties of an estimator.
Desirable statistic properties of an estimator include:
- Unbiasedness (sign of estimation error is random; the expected value of the estimator equals the parameter being estimated),
- Efficiency (lower sampling error than any other unbiased estimator)
- Consistency (variance of sampling error decreases and mean increases with sample size increases).
LOS 11. h: Distinguish between a point estimate and a confidence interval estimate of a population parameter.
Point estimates are single value estimates of population parameters. An estimator is a formula used to compute a point estimate.
Formula is Sample mean + or - (reliability factor x standard error); where reliability factor is Z(a/2)
Z (a/2) = 1.65 for 90% CI; 1.96 for 95% CI; 2.58 for 99% CI
LOS 11. h: Distinguish between a point estimate and a confidence interval estimate of a population parameter.
A range within which we can assert, with probability of 1 - a, the degree of confidence that the range will contain the parameter.
LOS 11. h: Distinguish between a point estimate and a confidence interval estimate of a population parameter. The reliability factor.
The reliability factor is a number that depends on the sampling distribution of the point estimate and the probability that the point estimate falls on the confidence interval.
LOS 11. i: Describe properties of Student’s t-distribution and calculate and interpret its degrees of freedom.
Use this when the following is present: Sample less than 30 and normal distribution with unknown variance;
Defined by a single parameter => degrees of freedom = n - 1
Lower peak than normal, fatter tails
Degrees of freedom for the t-distirbution are equal to n-1. Student’s t-distribution is closer to the normal distribution when df is greater, and confidence intervals are narrower when df is greater.
LOS 11. j: Calculate and interpret a confidence interval for a population mean, given a normal distribution with 1) a known population variance, 2) an unknown population variance, or 3) an unknown variance and a large sample size.
For a normally distributed population, a confidence interval for its mean can be constructed using a z-statistic when variance is known, and a t-statistic whne the variance is unknown. The z-statistic is acceptable in the case of a normal population with an unknown variance if the sample size is large (30+).
LOS 11. j: Calculate and interpret a confidence interval for a population mean, given a normal distribution with 1) a known population variance, 2) an unknown population variance, or 3) an unknown variance and a large sample size. Chart.
LOS 11. k: Describe the issues regarding selection of the appropriate sample size, data-mining bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias.
Increasing the sample size will generally improve parameter estimates and narrow confidence intervals. The cost of more data must be weighted against these benefits, and adding data that is not generated by the same distribution will not necessarily improve accuracy or narrow confidence intervals.