Sampling and Estimation Flashcards
Sampling error formula
sampling error of the mean = ^x - mue
ie sample mean - population mean
Systematic sampling
- select every kth member of the population until we have a sample of the desired size
- divide the entire population by the desired sample size
Stratified Random Sampling
- population is devided into subgroups based on one or more distinguishing characteristics.
- samples are then drawn from each subgroup, with their sample size proportional to the size of the subgroup relative to the population
- the sample will have the same distribution of key characteristics as the overall population
- more precise than simple random sampling
Cluster sampling
- similar to stratified random sampling, but requires the population to be divided into subpopulations, called clusters
- each cluster is essentially a mini-representation of the entire population
- for voting districts or school districts, market surveys
- less accurate but more time and cost-efficient
Standard error
SE = population (sample) stdv / sq root of n
The three desirable properties of an estimator are:
- unbiasedness
- efficiency
- consistency
Unbiasedness
- its expected value is equal to the parameter being estimated
Efficiency
- has the lowest variance as compared to other unbiased estimators of the same parameter
Consistency
- as sample size increases, the sampling error decreases and the estimates get closer to the actual value
Z score and “reliability factor” for:
90% CI
95% CI
99% CI
Z “reliability factor”
90% CI .05 1.65
95% CI .025 1.96
99% CI .005 2.58
Confidence Interval formula
CI = point estimate +- (reliability factor * std error)
point estimate ie mean
CI = ^x +- (Z a/2 * (pop std / sqrt n))
Z a/2: 1.65, 1.96, 2.58
CI = ^x +- (t a/2 * (sample std / sqrt n))
Significance level:
a (alpha)
a = .1 =
90% CI, a/2 = .05, reliability factor z=.05 = 1.65
a = .05
95% CI, a/2 = .025, reliability factor z=.025 = 1.96
a = .01
99% CI, a/2 = .005, reliability factor z=.005 = 2.58
Significance level:
a (alpha)
a = .1 =
90% CI, a/2 = .05, reliability factor z=.05 = 1.65
a = .05
95% CI, a/2 = .025, reliability factor z=.025 = 1.96
a = .01
99% CI, a/2 = .005, reliability factor z=.005 = 2.58
Which distribution (z or t) is used for: - For a normal distribution, variance is known, n <30 - For a normal distribution, variance is known, n >= 30 - For a normal distribution, variance is unknown, n < 30 - For a normal distribution, variance is unknown, n >= 30
- For a normal distribution, variance is known, n <30
z distribution - For a normal distribution, variance is known,
n >= 30
z distribution - For a normal distribution, variance is unknown, n < 30
t distribution - For a normal distribution, variance is unknown, n >= 30
t or z distribution
Which distribution (z or t) is used for:
- For a non-normal distribution, variance is known, n < 30
- For a non-normal distribution, variance is known, n >= 30
- For a non-normal distribution, variance is unknown, n < 30
- For a non-normal distribution, variance is known, n >= 30
- For a non-normal distribution, variance is known, n < 30
NA - For a non-normal distribution, variance is known, n >= 30
z distribution - For a non-normal distribution, variance is unknown, n < 30
NA - For a non-normal distribution, variance is known, n >= 30
t or z distribution
A larger sample size will…
- decrease the CI width and improves precision
When sample size (n) increases…
- the standard error (s/squrt n) decreases
Bootstrap
- treat the randomly drawn sample as if it were the actually the population
- take smaller samples from the larger sample one at a time and return them to the data, “sampling with replacement”
- the researcher has to decide how many repetitions are appropriate
Jackknife
- start with a sample of data
- subsequent samples are created by leaving out one observation at a time from the set, no replacement
- for a sample size of n, jackknife usually requires n repetitions
- used to reduce the vias of an estimator
- will produce similar results with every run
List the sampling biases
- data-snooping
- sample selection
- look-ahead
- time period
Data-snooping Bias
- practice of analyzing the same data again and again, till a pattern that works is identified
- signs:
too much digging/too little confidence
no story/no future
Sample Selection Bias
- survivorship bias
- backfill bias
Look-ahead Bias
- if a test uses information that was not available to market participants at the time the market participants act in the model
- ie using year-end P/B values if the decision would have taken place in Q2 of that fiscal year
Time-period Bias
- could use a time series that is too short, or too long