Final Exam Flashcards
sampling plan
specifies in advance how participants are to be selected and how many to include
to obtain an accessible sample to make an inference on target population
Target population
who you are trying to make an inference about
accessible population
aggregate of cases that conform to designated criteria and that are accessible for the study
sampling
process of selecting cases to represent an entire population, to permit inferences about the population
elements
basic units about which data are collected, usually humans
two goals of sampling plans
- representativeness of the general population
2. adequate size
if the population is not representative, what type of validity is threatened
external and construct
strata/stratum
subpopulations, mutually exclusive segment of a population defined by one or more characteristics
Ex: high school degree or w/o
multistage sampling
samples are selected in multiple phases
First Phase - large units are selected (i.e. hospital)
Next stage, smaller units are sampled (patients)
bias
systematic overrepresentation or underrepresentation of a population subgroup on a characteristic relevant to the research question
what drives sampling plan strategy
feasibility Ethics Desired rigor Convenience Costs
probability sampling
- Samples are randomly selected
- Everyone in population has an equal chance of being selected
- Used to control sampling bias
- Useful when focus is on population diversity
- Used when researcher needs to ensure accuracy
- Finding correct target population is not simple
(ex: public health initiatives)
non-probability sampling
Samples are selected based on researcher’s judgment
Not everyone has equal chance to participate
Sampling bias is not a primary concern
Useful in environment that shares similar traits
Does not help representation of entire population in terms of accuracy
Finding target population is very simple
when do you use probability sampling (3)
- reduce sampling bias
- when population is diverse
- to create an accurate sample
relationship between selecting a representative sample and sample size
Probability of selecting an unrepresentative sample decreases as size of sample increases
most basic type of probability sampling
simple random
stratified random sampling
population is divided into 2 or more homogenous strata form which elements are selected at random
proportionate stratified random sampling
participants are selected in proportion to the size of population stratum (i.e. race)
cluster sampling
involves selecting groups rather than selecting individuals as the first stage of a multistage approach
systematic sampling
involves selecting every kth person from a list
k = Divide N (population size) by n (sample size)
non probability convenience sampling
using the most conveniently available people as participants
Weakest form of sampling but most commonly used
snowball sampling
variant of convenience sampling → early sample members (seeds) are asked to refer other people who meet the eligibility criteria
quota sampling
non probability sampling
one in which the researcher identifies population strata and determines how many participants are needed from each stratum
consecutive sampling
recruiting all the people from an accessible population who meet eligibility criteria over a specific time interval or for a specified sample size
judgmental/provisional sampling
uses researcher’s knowledge about the population to make decisions
5 threats to statistical conclusion validity
Low statistical power Effect size - small, moderate effects need a larger sample size Heterogeneity of the population Cooperation Attrition
5 steps in sampling
Identification of the population
Specification of the eligibility criteria
Specify the sampling plan: decide method of drawing the sample and how large it will be (i.e. w/power analysis)
Recruit the sample: screening instruments
Generalizing from samples
which studies do not need a power analysis
Descriptive and exploratory studies, and non randomized trials
5 things required for sample size calculations
Significant level desired (a)
Power level of test desired (1 - beta)
Desired sample size (n)
Effect size desired (d) or Cohen’s d
magnitude b/w variables: small effect
.20
magnitude b/w variables: moderate
0.50
magnitude b/w variables: large effect
.8
Type I error
rejection of null hypothesis H0 when it is true
Concluding a relationship exists when it fact it does not
False positive
type II error
accepting the H0 when it is false
Concluding no relationships exists when it fact it does
False negative
how do you avoid type I error
by setting alpha at level they are comfortable with usually .05 or .01
how to control type II
by setting power (1 - beta) at 80% or 20% risk of committing a type II error
counterfactual
Expressing what has not happened but could, would or might under differing conditions
Ex: if researcher is doing an intervention - need to think what could, would or might happen under a different situation if intervention wasn’t done - what hasn’t happened (natural course of condition over time or intervention/condition that would influence outcomes)
3 criteria for causality
Temporal - IV before DV (cause before effect)
Relationship b/w IV and DV
No confounders
what type of study design enhances causality
experiemental, strongest - strongly controlled, minimizes bias
true experiments need what 3 things
intervention (manipulation)
control condition
randomization
blinding (masking)
concealing whether participant in intervention or control - conceal form participant, providers, data collectors, data analyst
single blinded
one group of participants does not what group is randomized, intervention or control
double blinded
those receiving intervention and those delivering intervention don’t know which group participants are in
placebo effects
changes in the outcome attribute to the placebo condition b/c of participants expectations of benefits or harm
complete randomization
no restrictions, allocate each person as they enroll into a study on a random basis - should only be used for studies of 200 or more
simple randomization
starting with a known sample size and then pre specifying proportion of subjects who will be randomly assigned to different tx conditions
gold standard for randomization
have someone unconnected w/ enrollment perform the treatment allocation
steps in RCT (6)
Screen for eligibility of the study Obtain informed consent Collect baseline data Randomly assign to condition Administer control or intervention Collect outcome data
basic experimental design
two groups and 1 intervention and 1 control group w/ outcome measure
Pre test post test design
you measure outcome twice: before and after intervention
post test only design
data on outcome are only collected once - after randomization and completion of the intervention
factorial design
manipulating 2 or more things
I.e. weight gain of infants - touch therapy, music therapy and control group
Can look at interventions separately and together
Look at interaction effect
crossover design
involves expsoing the same people to more that one condition
Must randomly assign participants to different orderings of treatment
concerns with cross over design and how to mitigate
Must be wary of carry over effects -when people are exposed to 2 different conditions, they may be influenced in the second condition by their experience in the first one
Can mitigate with a washout period - no treatment
hawthorne effect
caused by people’s expectations/knowledge of being in the study appears to affect peoples behavior
quasi-experiments
controlled trials without randomization, control group or both
Non equivalent control group design:
involves 2 groups of participants for whom outcomes are measured before and after intervention
Weaker because it cannot be assumed that the experimental and comparison groups are initially equivalent
time series design
data are collected over an extended period during which an intervention is introduced
Extended time period strengthens ability to attribute change to intervention
ex: Ex: rapid response teams were implemented in acute care units → administrators want to examine effects on patient outcomes –> Compare mortality rate before implementation and 3 months after
major strength of quasi-experimental studies
practical, mimics real world
limitation of quasi experimental
Could be other explanations for what happened (i.e. population is different) - rival hypotheses
descriptive correlation studies
to describe relationship among variables rather than to support inferences of causality
univariate descriptive studies
studies involves multiple variables but the primary purpose is to describe status of each, not to study correlation
Prevalence studies
done to estimate prevalence rate of some condition at a particular point in time
Cross sectional designs
incidence studies
estimate frequency of new cases
Need longitudinal designs to estimate incidence
retrospective design
ones in which a phenomenon existing in the present is linked to phenomena that occurred in the past
Begin with DV and then examines whether it is correlated with one or more previously occurring IV
prospective non experimental design
cohort design - researcher start w/ a presumed cause and then go forward in time to the presumed effect
nominal
mutually exclusive categories/groups but no hierarchy
involves assigning numbers to classify characteristics into categories
ordinal
ranked, sorted groups (highest to lowest), involves sorting people based on their relative ranking on an attribute
Doesn’t tell us about how greater one level is from another
I.e. education level
interval
occurs when researchers can assume equivalent distance between rank ordering on an attribute
Ex: temperature scale
ratio
interval level data which has a true zero (absence of a factor), the intervals between objects and the absolute magnitude of the attribute because there is rational meaningful zero
I.e. speed - 0 = not moving, person’s weight
+ skewed
longer tail points to the right
- skewed
tail points to the left
Unimodal distribution
has only one peak - a value with high frequency
Multimodal distribution
two or more peaks
mode
most frequently occurring score value in a distribution
median
point in a distribution above and which 50% of cases call - the midpoint
Usually reported if the data is skewed
mean
sum of all scores divided by the number of scores = average
Affected by every score
which is most stable - median, mode or mean
mean b/c it accounts for every data point
range
highest - lowest
Subtract lowest data point from the highest
variance
spread/dispersal of the data
Heterogenous or homogenous
standard deviation
average variance from mean, based on every score
More stable because it’s based on every score
inferential stats
Allows researchers to draw conclusion about a population, given data from a sample and permits inferences about whether results are likely to be found in a population
sampling error
tendency for statistics to fluctuate from one sample to another; the challenge is how to decide whether estimates are good population parameters