Week 5 : Sampling Strategies Flashcards
Target Populations…
- population parameters, census, sampling frame
- a group about which social scientists attempt to make generalizations about
- do not necessarily refer to groups of individuals, might refer to groups of nations, corporation, written documents or legal cases for example
Sample & Sampling
- Sample = a subset of the population selected for a study
- Sampling = the process of deciding what or whom to include in the sample
Unit of analysis…
- I will need to collect data from _______ to answer my research question
Population Parameter
- represents the ‘true value’ or ‘true measurement’ of the population
- they are often not feasible in social research… why?
- time, resource, frequency (every 5, 6, 10 or 15 years)
- limited number of questions
Census
- a study that includes data on every member of a population
- more common in social research when the population in question is not composed of people
- rare cuz they are often not feasible
1936 election (Landon vs Roosevelt)
- literary digest survey (2 million completed resources)
- survey (poll) result… Landon wins & Roosevelt loses
- actual result… Roosevelt wins & Landon loses
- What is wrong?… poor sampling strategy, sample did not equal the population
Observed value
true value + systematic error + random error = observed value
Errors in sampling
- Systematic error = cannot be estimated, only discuss direction of bias – flaw built into the design of the study
- Random error = unbiased, can be estimated using statistics
Probability samples
- samples that are based on random selection are called probability samples
- a probability sample is one in which (a) random choice is used to select participants for the sample and (b) each individual has a probability of being selected that can be calculated
- removes more systematic errors
Probability sample has 2 key advantages over a nonprobability sample
- estimates based on a probability sample are unbiased = to whatever extent estimates differ from the true population parameter, they are equally likely to overestimate it as underestimate it
- the only difference between the estimates and the true parameter is due to random chance = this difference is called sampling error (NOT a systematic error)
Margin of error
- the amount of uncertainty in an estimate
- equals to the distance between the estimate and the boundary of the confidence interval
- levels of confidence… 95%, 99%, etc.
Example of margin of error
- according to a Gallup poll, 43% of Americans approve the job the president is doing. This estimate has a margin of error of 3 percentage points at 95% confidence interval
- Translation… we can be 95% sure that the true level of presidential approval is between 40% and 46%…
- Calculating the confidence interval… Lower bound = mean - margin of error = 43-3=40 … Upper bound = mean + margin of error = 43 + 3=46
2 thins to emphasize about margin of error…
- margin of error pertains only to sampling error (so only applies to probability samples)
- margin of error has a specific relationship with sample size… as the sample size gets larger, the sampling error gets smaller & so does the margin of error (margin of error is proportional to the square root of the sample size)
Example of margin of error & sample size relationship
- Study A has a sample pf 100 ppl & a margin of error being 3%
- If we want to reduce the margin of error to 1%. How many ppl do we need to include in the sample?
○ Reduction in margin of errors = 3/1 = 3 times
○ Increase in sample size = (3)^2=9 times
○ So, in study A we need to have 9 x 100=900 respondents in the sample
Simple random sampling
- sampling frame = a list of population members from which a probability sampe is drawn
- the most straightforward type of probability sample
- each individual has the same probability of being selected into the sample
- each pair of individuals has the same probability of being selected (everyone’s chance of being selected into the sample is completely independent of everyone else’s)
- obtain sampling frame then generate a set of random numbers & select individual corresponding to select numbers
Systematic simple random sampling
- use a systematic sample to draw a sample that is 1/n the size of the total population…
- first select one of the first n individuals on the list of members in the sampling frame then then every nth member on the list after that
- not all pairs are equally likely !
- important in political exit polls
Cluster sampling
- no available sampling frame
- 1) divide target population into clusters (e.g. cities withing Canada, classrooms in a highschool)
- 2) select clusters randonly, get sampling frame for selected clusters
- 3) select individuals randomly from the selected clusters
- enhances feasibility, lower costs and works when a sampling frame doesn’t exist
Stratified sampling
- 1) obtain the sampling frame
- 2) divide the target population is divided into strata (e.g. gender, social class)
- 3) select individuals randomly from all strata
- 4) number of selected individuals reflects the proportions from each stratum
- prevents samples from becoming non-representative due to pure chance & can oversample for small groups
Weighting
- if a probability sample is conducted in such a way that different people have different possibilities of being selected, then the results must be weighted for estimates to be accurate
- when a sample is weighted, some observations count more than others
- the more a particular group is overrepresented in the sample, the less weight each individual from that group should recieve
- If Person A is x->(A/B) times more likely to be in our sample than Person B , then we give Person A 1/x times as much weight as Person B when computing our estimates
Post-survey weighting
- Response rate = number of valid responses/ number of invitations sent x 100%
- Gap between the desired sample & the actual sample
- nonresponse may create systematic difference between the sample and population
Example of postsurvey weighting
- based on census, we know there is 20% older adults in the population. However, due to nonreseponse, only 10% of the respondents in our sample are older adults
- the older adults in our sample should have more weight than younger adults
- how much more? population%/sample% = 20/10 = 2 times more
Nonprobability sample
- a sample that is not drawn using a method of random sampling
- the key issue here is representativeness
- there may be systematic differences between our sample & the target population
nonprobability sample
systematic error
- a flaw build into the design of the study
- impossible to quantify the size of the bias
- only possible to predict the direction of bias
nonprobability samples are not representative of the population
- researchers cannot concluded that a hypothesis holds true in the same way throughout all population subgroups (i.e. low generalizability)
2 benefits of nonrepresentative samples
- often better for initial tests of hypotheses than representative samples
1. the diversity of representative samples makes detecting cause-and-effect relationships more difficult (easier to identify cause when cases we study are very similar to one another
2. we can often gather more/better into on nonrepresentative samples than we can on representative samples
nonrepresentative sampling
1 - convenience sampling
- easiest & most convenient
- select any subjects who are willing to participate
- cheapest & easiest method
- systematic errors
nonprobability sample
2 - purposive sampling
- selecting cases based on key features…
- access to & quality of data
- typicality
- extremity
- importance
- deviant case
- contrasting outcomes
- key differences
- past experience & intuition
nonprobability sample
3 - sequential sampling
- collect additional data based on their findings from data they’ve already collected
- key informants = ppl whom a researcher interviews intensively, typically multiple times, over the course of a fieldwork project
- sampling for range = try to interview ppl who occupy a wide range of different roles within the organization
- saturation = the extent to which new interviews continue to generate new insights about the project
nonprobability sample
4 - snowball sampling
- starts with one respondent who meets the requirement for inclusion
- asks him/her to reccoment other ppl to contact
- useful for studying ‘hidden populations’ such as drug dealers & computer hackers
Big data
- ‘found data’
- electronic traces
- administratice records
- sample or population? probable sample