Topic 8: Sample Surveys Flashcards
What is the population?
The full amount of information being studied, collected through a census.
What is a sample?
A sample is part of the population (subset of original population)
What are the limitations of a census?
Collecting every unit of a population :
is hard
costs a lot of money
takes a lot of time
requires a lot of resources
Thus, we need samples to continue moving forward without a census
What is a parameter?
A parameter is a numerical fact (fixed, known number) about the population which we are interest about.
I.e. population mean
What is an estimate (or statistic)
It is a calculation of sample values which best predicts the parameter
I.e. sample mean
What are 4 common types of bias?
Selection bias
Non response bias
Interviewer’s bias
Measurement bias
What is selection bias
A systematic tendency to exclude or include one type of person from the sample who is doing something different, which influences the survey
What is non response bias
Caused by participants who fail to complete surveys. Non respondents could be very different to respondents –. effects survey
What is interviewer’s bias
When the interviewer has to make a choice of participants in the survey, or when the characteristics of the interviewer have an effect on answer given by the participant
What is measurement bias
When the form of the question in the survey affects responses to questionsh
What are some examples of measurement bias?
Bias in question wording and order, which impacts responses
Recall bias : people forget details
Sensitive questions: People may not tell the truth
Lack of clarity in the question
Attributes of interview process may cause bias
Will increasing the sample size account for the biases present in survey data collection?
No, instead it would amplify bias by repeating mistake on a larger scale
How do we pick a good sample
We use a probability method to pick the sample so that:
interviewer isnt involved in the selection and the method of selection is impartial
The interviewer can compute the chance of any particular individuals being chosen. I.e. defined procedure for selecting sample, which uses chance
What are 3 ways of picking a sample?
Multi stage cluster sampling
Quota sampling
Convenience sampling
What is multi stage cluster sampling
As simple random sampling isnt often practical, organisations often use this, which involves taking samples in stages, and individuals or clusters are chosen at random at each stage
I.e. randomly make a cluster –> randomly select a cluster –> randomly select individuals from that cluster
What is quota sampling
A non probability sampling technique where the sample has the same proportion of individuals as the entire population, with respect to known characteristics, or traits.
This results in unintentional bias from interviewers when they choose subjects to survey
What is convenience sampling
A non probabilistic sampling technique, where subjects are selected because of convenience of accessibility. Not recommended except to test the survey (pilot)
E.g. for psych experiments, uni students often make up samples, because its easy –> demographical problems
Can we avoid all bias?
No there is some level of unavoidable bias. Even with a probability method determining the sample, bias (e.g. non response) could easily come in)
In addition, because sample is only part of the population, we have chance error
What is the equation for estimates?
Estimate = parameter + bias + chance error
OR
Estimate = parameter + non sampling error + sampling error
What are some common methods of surveys?
Mail
Face to face interviews
phone interviews
Online
Self administered surveys
Can chance error occur in sample surveys?
Sample surveys involve chance error because each sample is just one possible draw from the population
Here we use the box model to quantify the likely size of chance error when estimating a proportion using simple random sampling. Standard errors (SE) measure variability across different samples from some population
Each different sample will give a different estimate
What does proportion of a sample survey mean?
If it asks about proportion of a sample survey, it means the mean of the sample (not sum)
What affects accuracy of samples
When sampling with replacement, SE is determined by the absolute size of the sample
When sampling without replacement, SE will be decreased by increasing the ratio of sample size to population size, as when a higher proportion of the population is sampled, the variability will increase
When sample is only a small part of population size, population has almost no effect on the SE of the estimate
Can we apply drawing without replacement to the box model?
No, it would be a different context as box model assumes draws with replacement
Do we need to adjust SE of the box model to account for without replacement?
Yes, we have to do it throguh a correction factor
What are the formulas for SE without replacement?
SE without replacement = correction factor x SE with replacement
What is the equation for correction factor?
sqrt ( (number of tickets - number of draws) / (number of tickets - 1) )
OR
sqrt ( (population size - sample size) / (population size - 1) )
What is bootstrapping?
It helps with addressing the problem of predicting population proportion from sample proportion
Boostrapping is estimating the properties of the population by using the properties of a particular sample
When sampling from a 0-1 box, we replace the unknown properties of 1’s in the box (population) by the known proportion of 1’s in a particular sample
What are the steps involved in bootstrapping?
1) create an approximate box which has the same proportions of 0’s and 1’s as the sample
2) use the box model
OR as internet says:
Choose a number of bootstrap samples to perform
Choose a sample size
For each bootstrap sample
Draw a sample with replacement with the chosen size
Calculate the statistic on the sample
Calculate the mean of the calculated sample statistics.
And then apply to the population
What are confidence intervals?
A confidence interval, in statistics, refers to the probability that a population parameter will fall between two set values.
What are the confidence intervals equal to? (68%, 95%, 99.7%)
68% –> sample proportion +- 1 x SE
95% –> sample proportion +- 2 x SE
99.7% –> sample proportion +- 3 x SE
What do we say when we have a 95% confidence interval
It is a mistake to say that the probability that the interval contains the unknown parameter is 0.95.
Instead, we say that if we workout a series of CIs for a series of sample, then 95% of the CIs would contain the unknown parameter
How can we stimulate a series of CIs? (using a random example)
I.e. create a population of size 1000000, with proportion of 1s (“yes” votes) is 0.67
Draw a sample of size 1000 from population, and calculate a 95% CI for the population proportion. Repeat the sampling 100 times, forming 100 CIs. Graph the 100 CIs
Draw a red line to represent the true population proportion (0.67) and calculate how many CIs fall inside and outside the red line, we expect 95% of CIs to cover the true population
Unless we draw without replacement, the fpc (pop - correction) applies on the SE
How can we justify the CI formulas
We assume the sample proportion (estimating population proportion) follows a normal distribution
Recall all normal distributions satisfy the “68% - 95% - 99.7%” rule