Sampling and Estimation Flashcards
Sampling error formula
sampling error of the mean = ^x - mue
ie sample mean - population mean
Systematic sampling
- select every kth member of the population until we have a sample of the desired size
- divide the entire population by the desired sample size
Stratified Random Sampling
- population is devided into subgroups based on one or more distinguishing characteristics.
- samples are then drawn from each subgroup, with their sample size proportional to the size of the subgroup relative to the population
- the sample will have the same distribution of key characteristics as the overall population
- more precise than simple random sampling
Cluster sampling
- similar to stratified random sampling, but requires the population to be divided into subpopulations, called clusters
- each cluster is essentially a mini-representation of the entire population
- for voting districts or school districts, market surveys
- less accurate but more time and cost-efficient
Standard error
SE = population (sample) stdv / sq root of n
The three desirable properties of an estimator are:
- unbiasedness
- efficiency
- consistency
Unbiasedness
- its expected value is equal to the parameter being estimated
Efficiency
- has the lowest variance as compared to other unbiased estimators of the same parameter
Consistency
- as sample size increases, the sampling error decreases and the estimates get closer to the actual value
Z score and “reliability factor” for:
90% CI
95% CI
99% CI
Z “reliability factor”
90% CI .05 1.65
95% CI .025 1.96
99% CI .005 2.58
Confidence Interval formula
CI = point estimate +- (reliability factor * std error)
point estimate ie mean
CI = ^x +- (Z a/2 * (pop std / sqrt n))
Z a/2: 1.65, 1.96, 2.58
CI = ^x +- (t a/2 * (sample std / sqrt n))
Significance level:
a (alpha)
a = .1 =
90% CI, a/2 = .05, reliability factor z=.05 = 1.65
a = .05
95% CI, a/2 = .025, reliability factor z=.025 = 1.96
a = .01
99% CI, a/2 = .005, reliability factor z=.005 = 2.58
Significance level:
a (alpha)
a = .1 =
90% CI, a/2 = .05, reliability factor z=.05 = 1.65
a = .05
95% CI, a/2 = .025, reliability factor z=.025 = 1.96
a = .01
99% CI, a/2 = .005, reliability factor z=.005 = 2.58
Which distribution (z or t) is used for: - For a normal distribution, variance is known, n <30 - For a normal distribution, variance is known, n >= 30 - For a normal distribution, variance is unknown, n < 30 - For a normal distribution, variance is unknown, n >= 30
- For a normal distribution, variance is known, n <30
z distribution - For a normal distribution, variance is known,
n >= 30
z distribution - For a normal distribution, variance is unknown, n < 30
t distribution - For a normal distribution, variance is unknown, n >= 30
t or z distribution
Which distribution (z or t) is used for:
- For a non-normal distribution, variance is known, n < 30
- For a non-normal distribution, variance is known, n >= 30
- For a non-normal distribution, variance is unknown, n < 30
- For a non-normal distribution, variance is known, n >= 30
- For a non-normal distribution, variance is known, n < 30
NA - For a non-normal distribution, variance is known, n >= 30
z distribution - For a non-normal distribution, variance is unknown, n < 30
NA - For a non-normal distribution, variance is known, n >= 30
t or z distribution