Unit 4 Flashcards
Population
The entire group of individuals we want information about
Census
Collects data from every individual in the population
Sample
A subset of individuals in the population from which we actually collect data
Convenience sample
Choosing individuals from the population who are easy to reach
-Often produces unrepresentative data
Bias
The design of a statistical study shows bias if it would consistently underestimate or consistently overestimate
-Convience and voluntary sampling
Voluntary response sample
Consits of people who choose themselves by repsonding to a general invitation
- Usually representitive of some large, population of interest
- Attracts people who feel strongly about an opinion and often share the same opinion
Random sampling
Involves using a chance process to determine which members of a populaiton are included in the samples
Simple Random sample
SRS of size n is chosen in such a way that every group of n individuals (combinations) in a population has an equal chance of being selected as the sample
-GIves each member and combo of members equal chance of being included/selected
Choosing a SRS with tech
Step 1: label. Give each individual in the population a distinct numerical label from 1 to N (population size)
Step 2: Randomize. Use a random number generator to obtain n (sample size) different intergerns from 1 to N
Choosing an SRS with Table D
Step 1: Label. Give each member of the population a numerical label with the same number of digigts. Try to use as few digits as possible
Step 2: Randomize. Read consecutive groups of digits of the appropriate length from L to R across a line in table d. Don’t use any digits outside populaiton size and don’t use repeats. Stop when you have n individuals
-All labels of the same length have the same chance to be chosen
Stratified random sample
- To get a stratified random sample, start by classifiying the population into groups of similar individuals, called strata. Then choose a separate SRS i each stratum and combine these SRSs to form the sample
- Works best when the indivudals in each stratum are similar with respect to what is being measured and when there are large differences between strata
- SImilar within and dif between. Gives a more precise estimate than simple random samples of the same size
- less variability/deviation in the stratified graph
- Want each stratum to contain similar individuals and for there to be a large difference between strata
Cluster sample
- To get one,s tart by classifuing the population into groups of individuals that are located near each other called clusters. Then choose an SRS of the clusters. ALl individuals in the chosen clusters are included in the sample
- Sometimes used to save money and time
- Sometimes people take and SRS of the cluster rather and survey all of the cluster
- Don’t offer the statistical advantage of better inforation about the population like stratified samples do
- Want each cluster to look like the population, but on a smaller size
Multistage sampling
Combines stratified and cluster sampling
Inference
- Infer about the population from what we know about the sample
- Inference from convience or voluntary samples would be misleading bc method of sampling is biased
Random sampling
- Rely on it to avoid bias in choosing a sample
- Unlikley that the results will be the same as the entire population
- Properly designed samples avoid systemic bias, but their results are rarely correct and we expect hem to vary from sample to sample
Why do we use Random sampling
- We can say how likely it is that sample results are close to the results of the population
- The laws of probably allow trustworthy inferences about the population
- Results come with “margin of error”
Do larger or smaller random samples give better info abt the the pop
Larger
-By having a very large random sample, you can be confident that the sample result is very close to the truth about the population
Sampling frame
The list of individuals that a sample will be made up of
Undercoverage
Occurs when some members of the population cannot be chosen in a sample
-Sample survey misses homeless people
Nonresponse
Occurs when an individual chosen for the sample can’t be contact or refuses to participate
- often exceeds 50%
- If the people who do respond respond differently than those who don’t bias results
- Don’t confuse with “voluntary sample”
Nonresponse vs voluntary response
-Nonresponse can occur only after a sample has been sleected. In voluntary repsonse- every individual has opted to take part so there won’t be non response
Response bias
- Lieing or answers incorrectly
- Gender, race, ethnicity can effect responses and create a systemic pattern of inaccuracy
- Wording and clarity of questions
Observational study
Observes individuals and measures variables of interest but does not attempt to influence the responses
-Goal is to describe some group or situation, to compare groups or examine relationships between variables
Experiment
Deliberately imposes some treatment on indviduals to measure their responses
- determine whether a treatment causes a change in response
What type of study (experiment or observational) should you use to understand cause and effect
Experiment is the only source of fully convincing data
Confounding variable
Variable that results in confounding
Counfounding
Occurs when two variables are associated in such a way that their effects on a response cannot be distinguished from each other
Treatment
Specific treatment applied to the individuals in an experiment. If an experiment has several explanatory variables, a treatment is a combination of specific values of these variables
Experimental units
The smallest collection of individuals to which treatments are applied. When the units are human beings, they are called subjects
Why choose experiments over studies
Experiments can give good evidence of causation while studies cannot
Factors
Aka explanatory variables
Level
A specific value of each factor
Random assignment
Means that experimental units are assigned to treatments using a chance process
-Eliminates bias and confounding variables and ensures the effect of other variables are spread equally among each group
Principals for designing experiments
- Comparison- design compares 2+ variables
- Random assignment- chance assigns experimental units to treatments
- Control- keep other variables that might effect response the same
- Replication- use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups
How do the dotplots of experiments with controlled variables compare to the dotplots of experiments without controlled variables
- Controlling variables helps reduce the variability of the data
- Dotplots with controlling have shorter ranges and are more scrunched up
Completely randomized design
The units are assigned to the treatments completely by chance
- Does not require each treatment to be assigned an equal number of subjects
- If forced =, not completely randomized
Control group
Used to provide a baseline for comparing the effects of other treatments
- many recieve no treatment
- some receive treatment (ex: comparing an and a new drug)
- some groups have no control groups-> comparing effects of several treatments to see which one works better
Placebo effect
Response to dummy treatment