Targets 3A-3G Flashcards
Population
The larger group we hope to learn something about
Census
Collect data from all members of a population
Sample
Subset of the population that actually gets examined or measured
Inference
Drawing a conclusion about a population based on data from a sample
Sampling Variability
We don’t expect the statistic from a sample to be the same as if we calculated it from the population; each sample produces a different result.
Random Sampling
typically produces a sample that is representative of the population
allows us to use the laws of probability to construct a margin of error
Larger ones give better information about a population than smaller ones
Sample survey
A study that uses an organized plan to choose a sample that represents some specific population
Good sample
representative of the population and will provide a good estimate of the value of interest
Biased sample
Systematically over or under represents a portion of the population and would consistently overestimate or underestimate the value of interest
Random Sampling Process
Use a chance process to determine which members of a population are included in the sample
Sampling without replacement: a member of the population can only be selected once
Sampling with replacement: a member of the population can be selected more than once
Sampling frame
the list of all the potential subjects in a population
Chance processes
flipping a coin, rolling dice, drawing names out of a well-mixed hat, random number generator, random digit table
Random Sampling Benefits
is the number 1 way to fight bias. It avoids favoritism by the sampler and self-selection by the respondents.
Simple Random Sampling (SRS) Characteristics
Every group of n individuals in the population has an equal chance of being selected for the sample.
Equivalent to throwing all members of a population into a hat, mixing them up, and then drawing without replacement
NOT the only legitimate method of random sampling
Simple Random Sampling (SRS) Process
Choosing a sample of size n from a population of size N
- Assign random numbers between 1 and N to each member of the population
- Use randomness (calculator or table) to select n unique numbers. Be sure to state: ignore unused numbers; ignore repeats (for sampling without replacement)
- The subjects corresponding to the chosen numbers are in the sample.
Stratified Random Sampling
Split the sampling frame into homogeneous groups, then pick a random sample from each group.
Benefit :1 Reduces variability in the sample statistic (Field of Dreams Lab)
Benefit 2: Guarantees that all subgroups are represented in the sample; allows for subgroup comparisons
Cluster Random Sampling
Split the sampling frame into heterogeneous clusters (members that are in close proximity to each
other), then randomly choose several clusters.
Sample ALL subjects in the randomly chosen clusters.
Benefit 1: Can make sampling less expensive by reducing travel time
Systematic Random Sampling
Begin with a randomly selected individual; every nth!” member after that will be in the sample.
Larger values of n generate smaller samples. Smaller values of n generate larger samples.
Benefit: Can be easier to use than an SRS especially if the population is lined up (or listed) already
If we want a sample of size 3, we’ll choose every 5th member of the population. (15 ÷ 3 = 5)
To begin, we choose a random number between 1 and 5. Our random number was 4, so the 4th
member is the first one for the sample. We also include every 5th member after that.
Biased Sample
Systematically over or under represents a portion of the population and would consistently overestimate or underestimate the value of interest.
Convenience Sample
select individuals from the population who are easy to reach
Example: A farmer brings a juice company several crates of oranges each week. A company inspector looks at
10 oranges from the top of each crate before deciding whether to buy all the oranges.
Voluntary Response Bias
happens when the sample is made up of subjects who chose to participate
Example: In 2016, Britain’s Natural Environment Research Council used an online poll to choose the name of
its new $300 million ship. The winning name “Boaty McBoatface” received 124,000 votes, far more than more
serious candidates “Shackleton”, “Endeavor”, and “Falcon”.
Undercoverage Bias
happens when a subgroup of the population has little to no chance of being chosen
Example: In 1936 legitimate polling organizations such as Gallup were just getting started. That same year the
magazine Literary Digest, used the phone book to send out a presidential poll. 2.4 million out of 10 million
mailed ballots were returned. 57% said they would vote for Alf Landon (Rep) and 43% said they would vote
for Franklin Roosevelt (Dem). In the actual election, Landon only won electoral votes from two states.
Response Bias / Question Wording bias
happens when the method of the survey (or the wording of the questions) influences the responses given by subjects
Example: How do the results from the Pierre, SD survey illustrate response bias?
Nonresponse bias
happens when a portion of the chosen sample does not respond to the survey; the non-respondents may share a common characteristic such as being stressed or busy
Example: With our modern-day caller ID feature, many people do not answer their phone when a polling
organization calls.
Observational Study
researchers observe their subjects
no treatment is imposed on the subjects
there is no control over other variables
vulnerable to confounding variables
Retrospective observational study
data are collected from past events or records; could be unreliable if subjects must recall information
Prospective observational study
subjects are identified at the beginning of the study and measurements are taken as the study unfolds; this could get expensive
Explanatory variable
a variable that we think explains or causes changes in the response variable
Response variable
the outcome being measured
Confounding Variable
a variable that is related to the explanatory variable and influences the response variable. It may create a false perception of an association between the explanatory and response variables.
Writing about a confounding variable
- Link the confounding variable to the explanatory variable.
- Link the confounding variable to the response variable.
Experiment
researchers randomly assign treatments to the subjects
Nature
Was a treatment randomly assigned to the subjects?
• No! We can only draw the conclusion that the two variables are associated.
• Yes! We can draw a cause-and-effect conclusion.
Scope
Was the sample randomly chosen?
• No! We may not generalize our conclusions to the larger population.
• Yes! We may generalize our conclusions to the larger population.
experimental unit / subject
the individuals to which treatments are applied
explanatory variables / factor
variables whose levels are manipulated intentionally
levels
dosages or durations of a factor
treatments
the specific combinations of levels that are imposed on the groups of subjects
placebo
a treatment that has no active ingredient but is otherwise like other treatments
placebo effect
subjects show an improvement after receiving a placebo simply because they expect an
improvement
blinding
keeping human participants unaware of the treatment applied in order to reduce response bias
single blind study
subjects do not know which treatment they are receiving, but members of the
research team who interact with them do, or vice versa
double blind study
neither the subjects nor the members of the research team who interact
with them know which treatment a subject is receiving
inference
using the results from a sample to draw a conclusion about the population
The results of a study are considered to be statistically significant when the observed results are too unusual
to be explained by chance alone.
Comparison
compare two or more treatment groups
Random Assignment Benefits
creates roughly equivalent groups of subjects
to balance the effects of unknown variables; also reduces bias
Control
control the levels of the explanatory factor and keep all other variables as similar as possible;
control prevents confounding
Control Group
group that receives the currently accepted treatment or no treatment at all; this
counts as one of the levels
Replication
apply the treatments to many subjects; also, run the study multiple times (in another location)
Random Chance Experiments
We rely on random chance to create groups that resemble each other so that the only differences between the
groups are the treatments imposed.
If we find that one group’s results are better than the other’s we can be reasonably certain that the explanatory
variable caused the difference.
Random Assignment of Treatments
If group sizes don’t need to be equal we can use a coin flip, roll dice or use a random digit table,
allowing repeats.
If group sizes must be equal we can write treatments on slips of paper or use a random digit table,
ignoring repeats.
Don’t flip a coin until one group is filled. Putting the rest of the subjects in the other group could
unintentionally create a confounding commonality in that group.
Describing a Completely Randomized Experiment
- Begin with a group of subjects.
- Randomly assign subjects to the treatment groups. State the treatments in context and explicitly link
each random outcome to a treatment.
- State the response variable in context that will be compared at the end of the experiment.
Randomized Block Experiment
The pool of subjects is purposely divided into blocks of homogenous subjects.
Within each block subjects are randomly assigned to treatments.
At the end of the experiment the measurements from each treatment group will be compared to the other treatment groups in the same block
We block on variables that we believe will affect the response variable so that the only differences between the treatment groups are the treatments
Blocking reduces variability in the results (much like stratification in sampling)
Statistics Haiku Rule 1
Control what you can - make the circumstances as similar as possible for all subjects, except for the explanatory variable – reduce confounding
Statistics Haiku Rule 2
Block on what you can’t control - if there is a variable that is suspected to affect the response variable, block on that variable to reduce confounding
Statistics Haiku Rule 3
Randomize the rest - Random assignment creates similar treatment groups. The effects of any unknown confounding variable should be minimized if it is equally represented in all groups
Matched Pair Design
The pool of subjects is purposely divided into blocks of size two (pairs)
Two subjects who are as alike as possible have treatments randomly assigned to them.
One subject provides two measurements with the order of treatments randomly determined
At the end of the experiment the differences between the 2 measurements will be analyzed
matched pair studies are very powerful because they can control for many many variables