Week 8 - Sampling w/ unequal probabilities Flashcards
For stratified sampling, which allocation method results in an equal probability of selection?
Proportional allocation only.
Equal, Neyman and optimal allocation do not.
Sometimes estimates can be improved by unequal probabilities. What are the 3 main reasons for unequal probabilities?
- Disproportionate stratification
- Multi-stage sampling
- Probability proportional to size (PPS) sampling
Disproportionate stratification
- Size of sample drawn from a particular stratum is NOT proportional to the relative size of that stratum
- 2 or more strata will have DIFF. SAMPLING FRACTIONS, f=n/N
- May be desirable b/c want to achieve precision requirements (which would not be achieved by equal prob. sampling b/c of the small size of some parts)
Multi-stage cluster sampling + 1 example
[5m, 2019]
*may be desirable b/c want to achieve precision requirements (which would not be achieved by equal prob. sampling b/c of the small size of some parts)
- We take a SRS of clusters
- then a SRS of elements within those clusters
- then a SRS of smaller elements within those elements
- and so on until the final sample elements are reached.
Prob. of selection of person k in household j in cluster i
πijk = n/N * mi/Mi * lij/Lij
eg. 4-stage cluster sample. We may seek a sample of European children for educational testing.
- Take a SRS of countries.
- Take a SRS of education authorities (or counties) in each country.
- Take a SRS of schools in each authority or county.
- Take a SRS of children in each school.
Explain what is meant by probability proportional to size (PPS) sampling and explain briefly the purpose of its use. [6m, 2017]
*gives unbiased estimation also
- The PPS sampling scheme can be employed when {sampling} units VARY BY SIZE, which is measured by z, and the y variables of main concern are roughly PROPORTIONAL to z. [3m]
- PPS sampling scheme improves PRECISION compared to SRS
- by giving larger units a greater chance of INCLUSION in the survey - Can also have PPS with replacement sampling, which is easy to implement, or w/o replacement sampling , which can be done in various ways.
[3m]
- the probability of inclusion in the sample will be proportional to size,
- so a village of 1500 residents will have 1/100th the chance of selection of a town of 150000.
PPS sampling with replacement: Hansen-Hurwitz estimator (unbiased)
t(pps) hat = 1/n * summation for n (yi/πi)
ybar(pps) hat = t(pps) hat / N
Var(t(pps) hat) hat = 1/n(n-1) * summation for n (yi/πi - t(pps) hat)^2
PPS sampling without replacement: Horvitz-Thomspon estimator (unbiased)
t(pps wor) hat = summation for n (yi/πi)
Complicated to estimate std error, usually use PPS w/ replacement as an approximation
Steps for selecting a PPS sample
- Find the total of zi, the size of the N units in pop, & the values of probability πi based on proportionality
- b/c Let tz = summation of N (zi) and Let πi = zi/tz - Generate RANDOM INTEGERS between 1 and the total
- Determine the corresponding yi of the random no.s drawn
{I guess for selecting sample alone we don’t actually need πi but it’s needed for estimating the mean and total}
Explain clearly why sampling with probability proportional to size may sometimes be preferred to simple random sampling.
[3m, 2012]
- Say we want to estimate the total of the Y -values.
- If larger units tend to have larger Y values, it makes sense to assign a higher probability to sampling them, as they contribute more to the total.
Give an intuitive reason that Cov(Yi, Yj) is negative.
[2m, 2012]
- If 1 observation is larger than the mean, the remaining ones are on average smaller than the mean,
- since we sample W/O replacement (and vice versa), hence a negative covariance.