Lecture 5: Sampling Flashcards
Why should we sample? The number of all ______ observations is often too great, or even infinite. Samples can be used to infer population statistics based on the CLT
POSSIBLE
T/F: samples must be drawn with care to ensure that it is an unbiased representation of the population
true
This explains WHY many statistical models work, even when the data doesn’t form a normal distribution.
The Central Limit Theorem
There are parts to the explanation of the CLT, can you tell me what they are?
1) n = 1000
2) n = >30
3) normal distribution
1) start with any population (n = 1000)
2) take repeated samples (n=>30)
3) as the # of samples taken increases, the distribution oft he sample means will start to resemble a normal distribution, no matter the shape of the original distribution
What are the five steps to the sampling process?
1) define the population
2) construct a sampling frame
3) select a sampling design
4) specify the information to be collected
5) collect data
“Population” is a collection of ___ _______ units of observations (who you want to later generalize about)
a collection of all possible units of observation
Definition: This is a representative list of units/individuals in the population to be sampled from
sampling frame
T/F: the sampling frame should try to include all units in the population (it should try to be EXHAUSTIVE)
True!
T/F: each unit/individual should only appear once in the sampling frame
true
What is a sampling design? (procedure to….)
procedure to select as many units from sampling frame to be in sample. ideally as many as possible (to satisfy the CLT)
How do we specify information to be collected? (Step 4)
Using questionnaires, loggers, instruments, etc. Usually involves a pre-pilot test.
Match this description to the correct step of the sampling process:
“All possible units of observation that you will later generalize about”
Step 1: define the population
Match this description to the correct step of the sampling process:
“Representative list of units/individuals in the population be sampled from”
Step 2: Construct a sampling frame
Match this description to the correct step of the sampling process:
“Procedure to select units from sampling frame to be in sample”
Step 3: Select a sampling design
Match this description to the correct step of the sampling process:
“using questionnaires, loggers, etc. usually involving a pre/pilot test”
Step 4: Specify information to be collected
Match this description to the correct step of the sampling process:
“Go get it! Limit bias as much as possible”
Step 5: Collect data
What are the 4 main categories of PROBABILITY SAMPLING?
Simple random
Systematic
Stratified
Cluster
What are the 4 main categories of NON-PROBABILITY SAMPLING?
Convenience
Volunteer
Snowball
Judgemental
Match the description to the correct sampling method category:
“A sample drawn such that every member of the population has a purely equal chance of being included”
Probability - simple random
Match the description to the correct sampling method category:
“A sample that is selected with numerical or spatial REGULARITY – every nth”
Probability - systematic
Match the description to the correct sampling method category:
“Used when a key characteristic of the population is of interest (program, age, etc.) and you want to ensure a proportionate sized sub-sample of each characteristic group”
Probability - stratified
Match the description to the correct sampling method category:
“Samples drawn from selected categories or quadrats from a very large study area … done to reduce cost and time”
Probability sampling - cluster
Match the example to the correct sampling method category:
“Picking 10 from a hat”
Probability - simple random
Match the example to the correct sampling method category:
“table of random numbers to select the whole sample”
probability - simple random
Match the example to the correct sampling method category:
“selecting every 5th person from a list of 52 people, giving me 10 people (52/10 =5.2 5.2, round down to 5)”
Probability - systematic
Match the example to the correct sampling method category:
“I need 10 participants in my sampling frame, but I want to make sure I represent all the majors in this class - that’s a key characteristic that I am interested in”
Probability: stratified
Match the example to the correct sampling method category:
“I will break the room into 4 sections, choose 2 sections, then systematically choose every 2nd person in those sectors”
Cluster (since you divided the group at random and selected which cluster to survey)
Match the example to the correct sampling method category:
“find the 30 people closest to you, then break them into males and females, then get 5 respondents from each.”
Non-probability: convenience - technically stratified convenience since you’re breaking it up with a characteristic of interest
Match the example to the correct sampling method category:
“a flyer goes up for participation on campus, but only the arts students respond based on available time to complete it and location of the flyer.”
Non-probability - volunteer
Match the example to the correct sampling method category:
“the immigrant food consumption study”
Non-probability - snowball
Match the example to the correct sampling method category:
“I know what a moustache looks like, so I’m going to pick three people who I think have what closely resembles MY version of a moustache”
Non-probability - judgemental
Match the description to the correct sampling method category:
“A sample in which only _______ or accessible members of the population are selected
You can have stratified convenience sampling too”
Non-probability - convenience
Match the description to the correct sampling method category:
“Individuals who self-select from a population
(frequently biased)”
Non-probability - volunteer
Match the description to the correct sampling method category:
“Referral sampling – asking for subjects to refer others. Good for hard-to-reach groups but can lead to bias”
Non-probability - snowball
Match the description to the correct sampling method category:
“Personal judgement is used to decide which units of a population are to be included in the sample.”
Non-probability - judgemental
Systematic sampling has a 4 part procedure. what are those 4 parts.
- Estimate size of sampling frame
- Divide by desired sample size and round DOWN to get interval (n)
- Select a random start point
- pick every nth unit
T/F: systematic sampling works well when no apparent irregularities exist
True
stratified sampling has a three-part procedure. what are those three parts?
- determine the sub-sample %’s in the population
- break desired sample into sub-sample groups by %
- sample within each group until you reach quota
When do you use stratified sampling?
When a key characteristic of the population is of interest.
T/F: stratified sampling helps improve representativeness
True
Cluster sampling has a 3 part procedure. How do you cluster sample?
- divide the study area into smaller units
- randomly select a set of units
- sample only within units using appropriate technique (random, systematic, etc).
T/F: cluster sampling is used to reduce travel costs
true
T/F: it is difficult to know if the sample obtained via convenience sampling is representative of the larger population
true
Why is volunteer sampling so frequently biased?
people that volunteer have an interest in what you’re doing and are often less busy than others
T/F: snowball (referral) sampling is good for harder to reach groups, or groups with no listings.
True
T/F: judgemental sampling involves selecting the ‘mode’ or the most ‘typical’ ‘illustrative’ cases. and is appropriate for case studies or difficult to reach cases.
true
What are the three sources of biases? STS
Spatial biases
Temporal biases
Selection bias
Which bias is this?
“only sampling students in the concourse”
spatial bias
Which bias is this?
“only sampling trees on the edges of the forest”
spatial bias
which bias is this?
“surveying traffic flows only at 8am”
temporal bias
Which bias is this?
“only selecting middle aged white people at the mall”
selection bias
how do you calculate the response rate?
RR% = #responded / #asked *100%