Exam 2 Flashcards
Practices that lead to misleading graphs
- truncated graphs
2. improper scaling
What is a truncated graph? what precaution should be taken with them?
A graph where the vertical axis does not start at 0, that causes bars to be out of proportion. The illustrator should include a special symbol to signify that graph truncation
Where does improper scaling occur the most?
pictograms
Guidelines for constructing effective graphs
- Title and axes labels
- Start vertical axis at 0 if possible
- Use caution with figures and pictograms
- If variables differ greatly, consider another graph or plotting relative sizes
- Use simplicity and clarity
Parts of a graph analysis
- purpose of graph
- are results observational or experimentally obtained
- what variable is measured and is it quantitative or categorical
- what type of data display?
- Can SOCS be used to describe the data if it’s numerical
- Is data displayed correctly and is the graph misleading?
explanatory variable
variable that is manipulated/experimented with
response variable
variable that measures the outcome of interest
lurking variable
unobserved variable that influences the association between explanatory and response variables and is associated with both of those variables
Designed experiment
An experiment where researchers impose treatments and controls. These can help establish causation
Observational study
A study where researchers observe characteristics and take measurements, these can only reveal association or correlation
Advantages of experiments
- Reduces chance of lurking variables affecting results
- Effect of an explanatory variable on a response variable is more accurately determined, it is easier to adjust for lurking variables
- best method for determining causality
sampling frame
a list of all members of a population
sampling design
method used to obtain a sample
random sampling
employs a random device to select a sample, each member of a population has an equal chance of being selected for the sample
Simple random sample
(SRS) each possible sample of a given size has the same chance of being selected, can be done with or without replacement.
What is the difference when SRS is performed with replacement vs. without replacement?
With replacement: a member of a population can be chosen more than once
Without replacement: a member of the population can only be selected once
Margin of error
Gives a range of plausible values for the population parameter, helps you determine how accurate results are, denoted by E, represents precision at a confidence level, half the width/length of a confidence interval
How to find the range of plausible values using a margin of error
Add and subtract the margin of error from the middle value
Approximate margin of error formula
1/(n)^(1/2)
Potential sources of bias in surveys(just a list of the types not definitions)
- sampling bias
- nonresponse bias
- response bias
Sampling bias
Bias that occurs in surveying when the sampling method does tends to obtain non-representative samples, including under coverage and overcoverage
Undercoverage
occurs when sampling frame does not represent parts of a population, some portion(s0 of the population are not sampled or get smaller representation than it has int he population
Overcoverage
Occurs when members that are not in the population of interest are included in the sample
Nonresponse bias
Bias that occurs in surveying when sampled subjects can’t be reached or refuse to participate, including when those who respond do not respond to certain questions resulting in missing data.
Response bias
Bias that occurs in surveying when the wording of a question is confusing, the question is asked in a misleading way, or subjects lie because they think their response is socially unacceptable
LIST of poor ways to sample
- Convenience sample
- Volunteer sample
- Large, non-representative sample
Convenience sampling
a poor method of sampling, includes individuals who are easy to sample and therefore, may not represent the whole population
Volunteer sample
a poor method of sampling, most common type of convenience sample, difficult to define sampling frame, may not represent the population because people who volunteer tend to have stronger opinions about the issue
Large non-representative sample
a poor method of sampling, sample size doesn’t matter if it’s not representative of the population
Questions to be asked when assessing the validity of surveys
- How was the sample selected?
- Sample size?
- Nonresponse rates?
- How are the questions worded-how many, confusing, misleading, controversial?
- Who sponsored the study?
treatment group
group that receives the treatment or experimental condition
placebo
a “fake” treatment that looks just like the treatment being tested, ensures that treatments appear the same to the subjects so that control subjects don’t know they are in the control group
placebo effect
subjects treated with a subject sometimes improve
single blind
subjects don’t know which groups they’re in
double blind
subjects and data collectors don’t know which group the subjects are in
Perks of randomization
eliminates bias, balances the groups on variables that may affect the groups, both known and unknown by researchers
statistically significant
when differences in an experiment are larger than the differences that result from randomization alone
Four principles of good experimental design
- control
- randomization
- replication
- Blocking(optional)
experimental units
people in the study
Things that can go wrong in an experiment
- making generalizations out of convenience
- sample isn’t representative
- no volunteers
- carefully evaluate displays
Systematic sampling characteristics
- Less expensive
- order of a list can not be associated in any way with the responses sought
- beware of confounding variables
When is cluster random sampling preferred?
when a reliable sampling frame is not available or when the cost of an SRS is too high
Cluster random sampling sampling
- Split the population into representative, heterogenous groups called clusters
- Use random sampling to select several clusters
- Perform a census of each selected cluster
Stratified random sampling
- stratify the population into homogenous groups
- SRS is used to choose members from each strata
- Combine the groups from each strata to form your sample
Multistage sampling
sampling schemes combining several methods
Types of observational studies
Retrospective observational studies: Look into the fast
Case-control study: a type of retrospective study, often used in medical research. Subjects who have a response outcome are referred to as cases and subjects who have the other response outcomes are referred to as cases
Prospective observational study: looks into future, aka cohort studies
Cross-sectional: sample survey of a cross section of a population in current time
Experimental design diagrams
enables a quick comparison of results, can use only number of groups for the explanatory variable
Purpose of matching and blocking
There are two ways researchers can balance the effects of potential lurking variables
Matching
used in observational studies, attempts to achieve the balance that randomization achieves, subjects are paired due to similarites not being studied, includes case control studies
Matched-pairs
used in experiments, subjects paired with themselves, each treatment is observed for each subject, pre test/ post test/ cross-over designs
Blocking
used in experiments, groups similar experimental units together, randomized, reduce potential bias, treatments are usually randomly assigned within a block
What does a statistic describe?
a sample
Statistical inference
uses sample data to draw conclusions about a population, involves probability calculations on a sampling distribution of a statistic, requires random sampling or randomization
point estimate
single number, representing our best guess for the parameter, for any particular parameter, there are several possible point estimates depending on the sample selected
Interval estimate
a range of plausible values for the parameter, consists of a point estimate and margin of error
Properties of point estimates
- unbiased
- small standard deviation
- likely precision
- high confidence level
standard error,
abbreviated SE, use of a statistic to compute the standard deivation of the sampling distribution, different for means and proportions
Z-scores for 0.90, 0.95 and 0.99 confidence levels
- 90: 1.645
- 95: 1.96
- 99: 2.576
Steps for constructing a confidence interval for one population proportion
- Check assumptions
- Calculate confidence interval
- Interpret confidence interval
Confidence interval assumptions
- Data is obtained by randomization
2. Large enough sample sizeL at least 15 successes and failures
What determines length of a confidence interval?
the precision of the estimate(wider=less precise)