section 1.3: sampling Flashcards
what is the first step in conducting research?
identify topics or questions that are to be investigated
what is a sample?
it represents a subset of the cases (population) and is often a small fraction of the population
when selecting samples by hand we run the risk of ______________________
bias, even if it is not intended
Almost all statistical methods are based on the notion of __________
implied randomness
what are the four random sampling techniques?
simple, stratified, cluster, and multistage sampling
what is anecdotal evidence?
a sample size of one or two, may not be representative of the entire population, not considered strong evidence
what is a simple random sample (SRS)?
the most basic random sample, like a raffle, each case has an equal chance of being selected
what does a high non-response rate tell us?
if the rate at which people do not respond to your survey is high, the results of the survey may not be representative of the population
what is a convenience sample?
individuals who are easily accessible are more likely to be included in the sample, may not be representative of the population
what is observational data?
data where no treatment has been explicitly applied (or explicitly withheld)
what is a confounding / lurking variable?
a variable that is correlated with both the explanatory and response variables, a third variable that is causing a change in both of the other variables
what is stratified sampling?
The population is divided into groups called strata. The strata are chosen so that similar cases are grouped together, then a second sampling method, usually simple random sampling, is employed within each stratum.
what is cluster sampling?
we break up the population into many groups, called clusters. Then we sample a fixed number of clusters and include all observations from each of those clusters in the sample.
what is multistage sampling?
like a cluster sample, but rather than keeping all observations in each cluster, we collect a random sample within each selected cluster
what two things do we consider when asking research questions?
the population and the sample
we want our sample to be ______________
representative of the population
how do we reduce the chance of bias?
by using random sampling
_______________ alone may not be enough to prevent bias
randomness
randomized experiments are necessary to show ___________
a causal connection
what is the gold standard of sampling?
the simple random sample (SRS)
observations can only show ______________
associations or help us form hypothesis that can be checked using an experiment
what is the difference between stratified and multistage sampling?
case-to-case variability in each cluster/strata. stratified is more useful when the cases in each strata are similar. multistage is more useful when the clusters are similar, but there is a lot of variability in each cluster
In the research question, what is the average mercury content in swordfish in the Atlantic Ocean, what is the population and what is the sample?
Population: the entire population of swordfish in the Atlantic ocean
Sample: taking 60 swordfish and testing their mercury levels
Suppose we ask a student who happens to be majoring in nutrition to select several graduates for the study. What type of students do you think she might select? Would her sample be representative of the population?
She would most likely choose people who are also majoring in nutrition because she knows them. This would not be representative of the population because the students come from one major only.
If 50% of online reviews for a product are negative, do you think this means that 50% of buyers are dissatisfied with the product?
No, because people that have negative experiences with the product are more likely to leave a review than people who had neutral or positive experiences. Only the people who feel strongly about the product will leave a review. Therefore, the results are biased.
Suppose an observational study tracked sunscreen use and skin cancer, and it was found that the more sunscreen someone used, the more likely the person was to have skin cancer. Does this mean sunscreen causes skin cancer?
Not necessarily, because this is only observational data, and we don’t know if there is a third variable causing a change in the other variables.
We learn that there are 30 villages in that part of the Indonesian jungle, each more or less similar to the next. Our goal is to test 150 individuals for malaria. What sampling method should be employed?
Since the clusters don’t look different from one another, we would conduct a cluster sample.