Week 9 - Weighting Flashcards
4 purpose of applying sampling weights
*Use weighting w/ concern b/c INCREASE VARIANCE of estimators
- To allow UNBIASED ESTIMATES for designs w/ unequal probability of selection
- To combine the state & national samples efficiently
- To use auxiliary data on known population characteristics to reduce sampling errors (PPS)
- To MINIMISE BIASES arising from differences between RESPONDENTS & NON-respondents in sample
> ie. weightings may reduce non-response bias (from 2019 paper)
2 types of weights + examples
- Unit-level weights
eg. Horvitz-Thompson weighting = Nh/nh - Class-level weights
eg. stratum weights = Nh/N
Unit-level weights for estimated total (& est. mean)
t hat = summation to n (wi^tot * yi)
If replace yi with 1 and add up all the sampling weights. we get an ESTIMATOR of pop. TOTAL
Estimated mean, y bar = t hat / N hat
*use N hat if don’t know pop. size N
4 applications of weighting + response rate
- Quota sampling
- Nonresponse
*rmb that weighting may reduce non-response bias!
- making a strong assumption that respondents & NON-respondents are similar
- response rate = no. of respondents / sample size
- can also do post stratification - For lack of representativeness
- Opinion polls in UK
Conclusions for weighting
- May not help adjust for opinion poll inaccuracies at recent elections
- Don’t always work
- Use weighting w/ concern b/c INCREASE VARIANCE of estimators
Weighting by PAST VOTE is sometimes used in OPINION POLLS concerned with voting intention.
1. Explain briefly how this is done and…
2. why this may not succeed in removing bias. [4m, 2016]
- Past vote is categorical variable, e.g. Conservative, Labour, Other, Did not vote.
- Population proportions known from {past} election.
- Need to ask question in survey on past vote and use this in WEIGHTING CLASS ESTIMATOR. [2m] - Problem that past vote is subject to measurement error (’FALSE RECALL’).
- Empirical evidence that this is the case, eg. people say they did vote when they did not.
- Also usual problems of DIFFICULTY in REMOVING BIAS using weighting class methods.
Explain how the weight for each class would be used for estimation. [1m, 2016]
The weights for classes Nh/N would be used to weight together MEANS calculated from data within each class.
Suppose that the cluster sampling scheme is adapted by subsampling of 1 in 2 pigs within sampled villages which are found to contain over 50 pigs and selecting all pigs in the remaining sampled villages. Explain how the principle of Horvitz-Thompson estimation could then be used to estimate t. [3m, 2015]
Suppose that we use RATIO ESTIMATOR, then with SUBSAMPLING replace (summation to n=10) ti
by summation(i∈s1) ti/πi, where s1 is SUBSAMPLE and πi is INCLUSION PROBABILITY in cluster i, which is 1 if no more than 50 pigs and 0.5 if 50 or more.
- Similar approach should be followed for (summation to n=10) Mi
Explain how survey weights may be used to obtain unbiased estimation for sampling with unequal probabilities. [1m, 2015]
Survey weights were explained both for…
1. WITH REPLACEMENT sampling where wi = 1/(Npi) and pi is DRAW PROBABILITY and for…
2. WITHOUT replacement sampling where wi = 1/πi and πi is INCLUSION PROBABILITY.
Suppose that the veterinary scientist decided to modify this cluster sampling scheme by subsampling 1 in 3 cows within sampled farms which are found to contain over 17 obese cows and selecting all obese cows in the remaining sampled farms. Comment on how the Horvitz-Thompson estimation could be used to estimate t. [6m, 2018]
Since the RATIO ESTIMATOR tˆrat = (summation ti)/(summation Mi) *(summation to N Mi) is used, then with SUBSAMPLING replace (summation to n) ti
by summation(i∈s1) ti/πi, where s1 is SUBSAMPLE and πi is INCLUSION PROBABILITY in cluster i.
- πi = 1 if fewer than 17 obese cows live on a farm
- πi = 1/3 if 17 or more obese cows live on a farm
- Similar approach should be followed for (summation to n) Mi
Show how weights can be used to estimate a POPULATION TOTAL when PPS with replacement sampling is employed & show that the weighted estimator is UNBIASED. [5m, 2013]
In PPS with replacement sampling, units are selected with probability pi = zi/summation(zj). {just use zi}
The population total T =SummationN yi is estimated by T hat =1/n (summation to n)yi/pi,
where the weight is (proportional to) 1/pi.
Unbiasedness follows from E(T hat) = 1/n (summation to n) E(yi/pi) and E(yi/pi) = (summation to N) (yi/pi)pi = T
3 possible reasons for the failure of opinion polls correctly to predict the outcomes of several recent elections
- Simple lying
- Last minute shift in opinions
- Dubious methodology, incl. online/phone/face to face discrepancies
What can the unit-level & class-level survey/sampling weights be used for?
[2021]
For estimating the population mean & population total