Topic 14: Sampling Flashcards
Learning outcomes:
- Identify the similarities between non-spatial and spatial sampling
- Recognize the different elements considered when selecting the number of samples
- Apply different methods for sampling and recognize their relative merits and drawbacks
k
Population
The total set of individuals or potential observations in a defined group
- eg., all the residents in Calgary
Sample
A subset of individuals or observations in the population
- Hopefully the sample represents the population
The Role of Sampling: Sampling helps us answer several difficult questions
- How large should the sample be?
- How/where should the samples be chosen?
- How much reliability will we have in results based on this sample
(all these revolve around how we can’t conduct a census of the entire population)
Sampling Units
The individual items in a sample, and the basic entity upon which observations are made
- May be discrete entities (eg., people, households, cities, etc.), points, or areas (eg., quadrants, strips, plots, pixels, etc.)
- Must be explicitly defined!
Sampling units must be selected to match the scale of the information desired
- eg., household income: households
- personal income: individual people
Steps for Sampling
Step 1: Conceptually define target population and target (
Step 2: designate sampled population and sampled area from sampling frame
Step 3: Select sampling design
Step 4: Design research and operational plan
Step 5: Conduct pretest
Step 6: Collect sample data
One important question to be addressed in a proper sample design is how large should the sample be to be representative?
- Less certainty with small samples
- More certainty with larger sample
- larger samples = more cost
What are the two commonly used strategies for sample-size determination?
Rules of thumb and formulas
*be careful with rule of thumb - need to know why they made those decisions
Formulas
The precision of the estimate of a population parameter is a function of the variance of the population, the sample size, and the allowable error
For determining the sample size necessary to estimate the population mean: n=(Zs/E)^2
n= number of samples
Z= desired level of confidence
s= standard deviation of a pilot sample
E=tolerable error
Tolerable error is inversely related: more samples = less error
Confidence level is directly related: more samples = higher confidence
method #2: n = (t^2*CV^2)/(E^2)
n = sample size t = student's t value for the specified probability CV = coefficient of variation E = tolerable error, expressed as % of the mean
Student’s t value: threshold for comparing small numbers of thing in statistical test
if you want statistical validity, you need 200+ observations
What is the sample size determination procedure?
- make a reasonable guess at the value of n
- How much time do you have? resources?
- Guess may come from previous studies - Look up critical Student’s t-value
- two tailed probability of obtaining a larger value - Select value for E (allowable error)
- 10-20% is a reasonable place to start
- how much error will you allow? - Select a value fro CV (coefficient of variation)
- Need prior estimate of variation - preliminary (pilot) sample?
- most of the coefficient variable is coming from the pretest - Calculate n
- Proceed iteratively until n is reasonable
- things to change: n and E
Where/how to choose samples?
- Now when you knoe that we need n samples, where or how do we choose them?
- There are many techniques designed to help achieve a sample that is ‘representative’ of the population
- The major issue to avoid is bias
- Under-representing or over-representing elements of the population because of inappropriate sample design
Sampling methods/designs:
Non-probability: Judgemental
- Personal judgement
- Personal knowledge or knowledge of other people who have done similar studies
Quota
Based on economics of a sample
Snowball
Meandering - eg., asking one person and they give you another person to talk to
Gives the opportunity for rich information
Probability methods
Systematic, simple random, stratified random, clustered random
What is selective sampling?
Observer manually selects sampling unit locations in areas that appear to be representative
What are the advantages and disadvantages to selective sampling?
Advantages: many would argue it has no place in well-designed sampling strategies BUT can bail you out in some practical situations
Disadvantages: Relies on human choice, which is prejudiced by individual opinion, and may result in results that are not representative of the population
What is simple random sampling?
- the fundamental sampling method
- Each sampling unit in the population has an equal chance of being selected
- the selection of any individual should not affect the chance of selecting another individual
- First unit sampled has the same probability to be selected again
What are the advantages and disadvantages of simple random sampling?
Advantages:
- given sufficient samples, produces an unbiased estimate of population mean and information needed to asses the sampling error
- Computers assist greatly through random number generators or random point generators
Disadvantages:
- The two criteria (equal and independent) is harder to achieve than you might realize
Requires developing a system that is consistent with the sampling unity: how would you conduct a random sample of all the trees in a forest?
- Individuals must be sampled with replacement in order to be a true random sample. This means that an individual can get selected more than once, even if it is unlikely
- Cost and difficulty in accessing widely-dispersed locations, if field work is involved (locating them, traveling between them)
- May miss small groups in the population, or produce estimates that are biased towards larger groups
What is systematic sampling?`
Samples are selected in a systematic or regular fashion, though the starting point is random
-eg., every fourth address in the phone book, or 10 meters along a transect
What are the advantages and disadvantages of systematic sampling?
Advantages:
- Can provide reliable population estimates by spreading sample over the entire population
- simple to execute in practice, because sample units are located at regular intervals and travel straight-forward
Disadvantages:
- Not completely unbiased, because selections are not truly equal and independent
- Can fare poorly if systematic patterns occur within population
What is stratified random sampling?
- Population is stratified, or divided, into groups with reduced variability, and sampled randomly within each strata
What is proportional allocation and optimum allocation?
Proportional: sampling intensity proportional to sizes of strata (e.g., we want 60% of our samples to be in 60% of the area
Optimum: sampling intensity proportional to standard deviation of the distribution variable
What are the advantages and disadvantages of stratified random sampling?
Advantages:
- May produce more accuarate results because small groups are not missed and large ones are not over represented
- Can generate separate estimates for each strata
Disadvantages:
- A basis for stratification is required
- sampling estimates are subject to errors in the stratification criteria