Weeks 4 & 5 - Stratified & cluster sampling Flashcards

1
Q

Remarks about the 3 types of allocation for stratified sampling (+ general optimal allocation)

Define proportional and Neyman allocation [2011]

A

Proportional allocation
- Sample size in stratum proportional to stratum size (nh proportional to Nh)
1. Sample ‘mirrors’ population, representative
2. Simple estimation

Neyman allocation
- Sample size in stratum proportional to stratum standard deviation (nh proportional to NhSh)
1. a special case of optimal allocation when costs for all strata are the same.
2. If stratum standard deviations vary a lot, Neyman allocation gives the greatest advantage
- Variance is minimised when nh is proportional to NhSh

Equal allocation
1. If similar variance, implies that standard errors are similar.
Therefore, point estimates are comparable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

4 reasons for stratified sampling

A
  1. To study specific SUBPOPULATIONS
    eg. F vs M, geographic regions
  2. To assist in implementing OPERATIONAL ASPECTS of survey
    eg. big and small farms
  3. To improve representativeness
  4. To improve precision
    ^with homogeneous strata
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is SRS possibly problematic? And why could stratified sampling be advantageous?

A

SRS can’t guarantee small variance b/c of the equal probability of selection -> results can vary a lot.
+ could also lead to extreme samples, eg. all men

Stratified sampling can have smaller variance (if homo within)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the primary sampling units & secondary sampling units in cluster sampling?

*we want clusters to have high heterogeneity within -> mini version of pop, good representation

A

PSU = clusters, eg. blocks, schools
SSU = eg. households in the blocks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cluster sampling - estimated population size formula

A

M hat = N x (m bar)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Cluster sampling - design effect formula

Comment on how you would decide between the unbiased estimator under simple random sampling and the ratio estimator under cluster sampling. [6m, 2018]

A

Var(t cl hat) / Var(t SRS hat) ~= [1 + (K-1)ICC ]

where ICC = intra-cluster correlation coefficient measures homogeneity within clusters

  1. When ICC=1, perfectly homogeneous & Var(t cl hat) > Var(t SRS hat)
  2. When ICC=0, perfectly heterogeneous & Var(t cl hat) = Var(t SRS hat)
  3. The PRECISION of the ratio estimator of t under cluster sampling may be expected to be WORSE than the precision of the unbiased estimator of t under simple random sampling…
  4. b/c of a TENDENCY FOR HOMOGENEITY of cow weight on farms.
  5. The RELEVANT FACTOR is the INTRA-farm CORRELATION of weight of obese cows.

*Bear in mind that the goal is for the clusters to be just as heterogeneous as the whole pop, so that the selection of a given cluster will yield the same information as the random selection of individuals from the entire population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe briefly 1-stage cluster sampling. [2m]

A

The population is divided into N clusters, where cluster i is of size Mi
for i = 1, 2, …N
and where elements in cluster i are labelled j = 1, 2, …, Mi
- Sampling units are clusters which are group of elements and we select a simple random sample of n clusters but can use any design to select clusters.
- Regarding data collection, information should be collected on all elements in the cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2 reasons why cluster sampling is used

A
  1. May not have a list of elements for a SAMPLING FRAME, but a list of clusters may be available
  2. May be CHEAPER to conduct the study if elements are CLUSTERED
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain how cluster sampling may lead to less precise estimates. [2m]

A
  • Elements within clusters tend to be similar, ie. clusters tend to be homogeneous
  • Homogeneous clusters give less information than if the same no. of unrelated elements are selected
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

4 factors that should be considered in the choice of stratification scheme [4m, 2016]

A
  1. whether any SUBPOPULATIONS are of interest (in which case these might form strata);
  2. to improve PRECISION would like to stratify by variables STRONGLY RELATED to principal variables of interest (to achieve HOMOGENEOUS strata);
  3. whether COST of data collection varies by some factor in which case might wish to stratify by this factor;
  4. are there any reasons why different modes of data collection would be used for different kinds of firms (in which case these might define strata).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If the Q asks if the difference between 2 estimated means is significant, what should I do?

(from 2012)

A

Derive the standard error of the difference then say significant or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Explain why stratified sampling can be seen as an extreme form of cluster sampling.

[4m, 2011]

A

A special case of two-stage cluster sampling: all ‘clusters’ are sampled, then a sample from each cluster.

{I guess b/c there is a tendency for homogeneity within clusters?}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Prove the difference in variance between proportional and Neyman allocation is 1/n (summation to k) Wi(Si - Sbar)^2

What is Sbar? Explain when the difference is greatest between the precision of the estimators.

[2011]

A

Sbar = summation(WiSi)

See 2011 paper for workings

Difference is greatest when the stratum std devs Si vary a lot from each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What information would it be helpful to have in judging the suitability of a stratifying variable?

[2014]

A

Need information on the distribution of __ BETWEEN & WITHIN stratification categories to judge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to describe if it makes sense to use stratified sampling given a BOXPLOT?

A

If suitable,
- strata are homogeneous
- within stratum variances clearly lower than overall variance
- so stratified sampling will reduce precision compared to SRS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly