Exam 2 Flashcards
for a study to be an experiment, it has to have…
- at least one manipulated variable
- at least one measured variable
control variable
a variable that an experimenter holds constant on purpose (besides the independent variables)
- not really variables because they do not vary
why experiments support causal claims through the 3 criteria
- establish covariance: changes in the independent variables are related to the changes in the dependent variable
- establish temporal precedence: The causal variable should come before the outcome variable
- establish internal validity: There are no other likely explanations for the relationship observed
placebo (control) groups
a group that is exposed to an inert treatment; comparison group may not need to be a control group (i.e., no treatment)
confounds
an unmeasured variable that influences both the supposed cause and the supposed effect
design confounds
an accidental second variable varies systematically along with the intended independent variable
selection confounds (selection effects)
when the kinds of participants in one level of the independent variable are systematically different from those in the other
- avoid selection effects with random assignment
- avoiding selection effects with matched groups
types of experimental design
- independent-groups designs (between-sujects)
- within-subjects designs
- pottest-only designs
- pretest/posttest designs
independent-groups designs (between-subjects)
Separate groups of participants are placed into different levels of the independent variable
- Ex. an experiment exploring how different amounts of sleep affect people’s reaction times → level 1: 3 hours of sleep, level 2: 8 hours of sleep
within-subjects designs
- Each person is presented with all levels of the independent variable
- One set of participants are tested more than once, and their scores are compared
- Ex. an experiment exploring how different amounts of sleep affect people’s reaction times → all groups do day 1 with 3 hours of sleep and day 2 with 8 hours of sleep
posttest-only designs
Participants are randomly assigned to independent variable groups and are tested on the dependent variable once
pretest/posttest designs
Participants are randomly assigned to at least two difference groups and are tested on the key dependent variable twice— once before and once after exposure to the independent variable
problems using pretest/posttest design
threat to internal validity
- Taking the pretest affects how participants do the posttest → testing threat
- Participants may get tired from a long study with a pretest and posttest
repeated-measures design
participants are measured on a dependent variable more than once, after exposure to each level of the independent variable
concurrent-measures design
participants are exposed to all the levels of an independent variable at roughly the same time, and a single attitudinal or behavioral preference is the dependent variable
advantages of within-groups design
- Participants in your groups are equivalent because they are the same participants and serve as their own controls
- Gives researchers more power to notice difference between conditions because there is less extraneous error in the measurement
- Requires fewer participants
disadvantages of within-groups design
- potential for order effects
- carryover effects
- might not be practical or possible
- experiencing all levels of the IV changes the way participants act –> demand characteristics
demand characteristics
subtle cues or aspects of an experiment that might unintentionally signal to participants what the study is about, leading them to change their behavior to fit that perceived expectation
order effects
When the sequence in which stimuli are presented to participants influences their responses; order of conditions can affect the results
when being exposed to one condition affects how participants respond to other conditions
carryover effects
- practice effect –> participants perform better during later treatment conditions because they’ve had time to practice and improve
- Fatigue effect → participants perform worse during later treatment conditions because they’re tired or fatigued
avoiding order effects
- full counterbalancing
- partial counterbalancing
solution to order effects
counterbalancing
interrogating causal claims with construct validity
How well were the variables measured and manipulated?
- dependent variables: check face validity, interrater reliability, and convergent vailidty
- independent variables: How well were they manipulated?
- manipulation check
interrogating causal claims with external validity
To whom or what can the causal claim generalize?
- generalizing to other people
- generalizing to other situations
interrogating causal claims with statistical validity
How well do the data support the causal claim?
- accuracy of the conclusions drawn from a study’s statistical analysis
- Is the difference statistically significant?: P-value < .05 usually considered statistically significant
- How large is the effect?: correlation coefficient (r), Cohen’s d
- confidence interval
interrogating causal claims with internal validity
Are there alternative explanations for the outcome?
- Were there any design confounds?
- If an independent-groups design was used, did they control for selection effects using random assignment or matching?
- If a within-groups design was used, did they control for order effects by counterbalancing?
manipulation check
extra dependent variable that researchers can include to convince them that their experimental manipulation worked
ex. A study comparing the effect of a serious lecture vs. a funny lecture on the memory of lecture information; Manipulation check - how funny was the lecture?
correlation coefficient (r)
indicates the strength of a linear association between two variables (association claim)
Cohen’s d
standardized effect size for measuring the difference between two group means
(group A mean - group B mean) / pooled standard deviation
- small (d = 0.2)
- medium (d = 0.5)
- large (d = 0.8)
confidence interval
a range of values, bounded above and below the statistics’s mean, that likely would contain an unknown population parameter
- when a study has a small sample and more variability, CI will be relatively wide (less precise)
- when a study has a larger sample and less variability, CI will be narrower (more precise)
six potential internal validity threats in one-group, pretest/posttest designs
- maturation
- history
- regression
- attrition
- testing
- instrumentation
maturation threat to internal validity
A change in behavior that emerges more or less spontaneously over time
- Spontaneous remission is a specific type of maturation
- E.g., people adapt to changed environments; children get better at walking, talking, reading, etc.; plants grow taller
preventing maturation threats
Include an appropriate comparison group
history threats to internal validity
- Something specific has happened between the pretest and posttest (not just time has passed)
- “Historical” or external factor that systematically affects most members of the treatment group at the same time as the treatment itself
- E.g., the rowdy boys started a swimming course and the exercise tired most of them out
preventing history threats
include a comparison group
regression threats to internal validity
- regression to the mean: the tendency of results that are extreme by chance on first measurement to move closer to the average when measured a second time
- Occurs only when a group is measured twice, and
- Only when the group has an extreme score at pretest
- E.g., the 40 depressed women might have scores exceptionally high on the depression pretest due to random effects, such as recent illness, family or relationship problems
preventing regression threats
include a comparison group
attrition threats to internal validity
- A reduction in participant numbers that occurs when people drop out before the end of the study
- Problem for internal validity when attrition is systematic— only a certain kind of participant drops out
preventing attrition threats
Remove the dropped-out participants’ scores from the pretest average too
testing threats to internal validity
A change in the participants as a result of taking a test (dependent measure) more than once
preventing testing threats
- No pretest
- Two different forms— one for pretest and one for posttest
- Include a comparison group
instrumentation threats to internal validity
- Occur when a measure instrument changes over time
OR - When a researcher uses different forms for the pretest and posttest, but the two forms are not sufficiently equivalent
- E.g., people judging the rowdy campers’ behavior became more tolerant of loud voices and rough-and-tumble play
preventing instrumentation threats
- Use a posttest-only design
- Ensure that the pretest and posttest measures are equivalent
- Counterbalance the versions of the test
three potential internal validity threats in any study
- observer bias
- demand characteristics
- placebo effects
observer bias
when researchers’ expectation influence their interpretation of the results
placebo effects
eople’s behaviors or symptoms respond not just to the treatment, but also to their belief in what the treatment can do to alter their situation
controlling for observer bias and demand characteristics
- double-blind study
- masked design (or blind design)
double-blind study
Neither the participants nor the researchers who evaluate them
know who is in the treatment group and who is in the comparison group
masked design (or blind design)
the investigator does not know the identity of the treatment assignment
preventing placebo effects
Use a double-blind placebo control study:
- One group receives real treatment, and another group receives the placebo treatment
- Neither the participants nor the investigators know who is in the experimental group or in the placebo group
interrogating null effects
1) the independent variable has no effect on the dependent variable
2) other reasons: a) not enough difference between groups, b) there is too much variability within groups
3) insensitive measures
not enough between-groups difference in interrogating null effects
- Weak manipulations
- Insensitive measures
- Ceiling and floor effects
- Reverse design confounds
ceiling and floor effects
Both control and experimental groups scored very high or very low
reverse design confounds
Design confounds acting in reverse
weak manipulations
ask: How did the researchers operationalize the independent variable?
E.g., Does money make people happy?
- A researcher gives one group of participants $1 and another group $2
- The manipulation is not strong → try $1 vs. $50
insensitive measures
- Dependence measure was not sensitive enough
- Solution: use a detailed, quantitative increments measures
solution to ceiling and floor effects and dependent variables
use a manipulation check
within-groups variability in interrogating null effects
Too much unsystematic variability within each group → noise (error variance or unsystematic variance)
measurement error in interrogating null effects
- Error in the measurement
- All dependent variables involve a certain amount of measurement error
- Solution 1: use reliable, precise tools; Have excellent reliability (internal, interrater, and test-retest)
- Solution 2: measure more instances
solution to individual differences in interrogating null effects
- Solution 1: change the design; Use a within-groups design instead of independent-groups design
- Solution 2: add more participants
solution for situation noise in interrogating null effects
carefully control the surroundings of an experiment
power
the likelihood that a study will return an accurate result when the independent variable really has an effect
what increases power
- Within-groups design
- A strong manipulation
- A larger number of participants
- Less situation noise
experiments with two independent variables can show interactions
To test whether the effect of the original independent variable depends on the level of another independent variable
- Interaction = a difference in differences = the effect of one independent variable depends on the level of the other independent variable
factorial design
when there are two or more independent variables (or factors)
- can test limits
- can test theories
factorial design and testing limits
- a form of external validity (testing the generalizability)
- interactions show moderators
moderator
a variable that changes the relationship between two other variables
interpreting factorial results
- main effects
- interactions
- If you have two independent variables, there will be two main effects and one possible interaction between them
- With three independent variables, there will be three main effects and several possible interactions (including two-way interactions and a three-way interaction)
factorial variations
- Independent-groups factorial designs
- Within-groups factorial designs
- Mixed factorial designs
- Increasing the number of levels of an independent variable
- Increasing the number of independent variables
independent-groups factorial design
2 x 2 factorial design = 4 conditions
- different people in each group
within-groups factorial design
2 x 2 factorial design = 4 conditions
- One group of participants participate in all four combinations
- Has more statistical power than an independent-groups factorial design
mixed factorial design
One independent variable is manipulated as independent-groups (between-groups) and the other is manipulated as within-groups
E.g., cell phone use (on the phone vs. not on the phone) x age (old vs. young)
- Old participants take part in both “on the phone” and “not on the phone” conditions
- Same for the young participants
increasing the number of levels of an independent variable for factorial variation
- Simplest factorial design 2 x 2: Two independent variables with two levels in each independent variable
- Can add one or more levels in one or both independent variables: E..g, two independent variables with three levels in one of the independent variables
increasing the number of independent variables for factorial variation
E.g., 2 x 2 x 2 factorial, or a three-way design
- Three main effects
- Three possible two-way interactions
- One three-way interaction
quasi-experiments
- Do not have full experimental control
- First select an independent variable and a dependent variable
- Random assignment might not be possible
nonequivalent control group pretest/posttest design
ex. study investigating the psychological effects of cosmetic surgery
- group of people who got the surgery vs. group of people who registered at the same clinic but did not receive any procedure
nonequivalent control group posttest-only design
ex. opt-in vs. opt-out default options of organ donation across countries
- no control over which countries had which defaults
- no random assignments of people
interrupted time-series design
ex. investigating popular shows and suicide
- the variable (of the suicide rates in the US) was measured repeatedly— before, during, and after the “interruption” caused by some event (the introduction of the show 13 Reasons Why)
nonequivalent control group interrupted time-series design
ex. investigating the effect of legislation on opioid abuse
- Florida passed laws that medical clinics could not dispense opioids, North Carolina did not
- nonequivalent control group → not randomly assigned to having the pill mill laws or not
- Interrupted time-series → researchers did not have experimental control over the year the laws were passed
threats to internal validity in quasi-experiments
- selection effects
- design confounds
- maturation threats
- history threats
- repression to the mean
- attrition threats
- testing threats
- instrumentation threats
- observer bias, demand characteristics, and placebo effects
selection effects in quasi-experiments
- Relevant only for independent-groups designs, not for repeated-measures designs
- Applies when the kinds of participants at one level of the independent variable are systematically different from those at the other level
Example 1: nudging people toward organ donation
- No selection effect
Example 2: the psychological effects of cosmetic surgery
- Maybe selection effect, but not likely
design confounds in quasi-experiments
- Some outside variable accidentally and systematically varies with the levels of the targeted independent variable
Example 1: nudging people toward organ donation
- Maybe some other government policy co-occur with the presumed consent
- Not likely as all seven countries with presumed consent policies should also have the same or similar policy
maturation threats in quasi-experiments
In an experiment or quasi-experimental design with a pretest and posttest, the observed change could have emerged more or less spontaneously over time
Example 2: the psychological effects of cosmetic surgery
- The comparison group did not improve over time —> no maturation threat
Example 4: investigating the effect of legislation on opioid abuse
- The comparison group (North Carolina) overdose deaths did not decline —> no maturation threat
history threats in quasi-experiments
Occurs when an external, historical event happens for everyone in a study at the same time as the treatment
Example 3: popular shows and suicide
- Suicide rates might have increased because of a suicide of a celebrity
Example 4: investigating the effect of legislation on opioid abuse
- The results might have been caused by some change in employment or living conditions
regression to the mean in quasi-experiments
Only for pretest/posttest designs
Example 2: the psychological effects of cosmetic surgery
attrition threats in quasi-experiments
- In a pretest/posttest design, attrition occurs when people drop out of a study over time
- Becomes a threat to internal validity when systematic kinds of people drop out of a study
testing threats in quasi-experiments
A kind of order effect in which participants change as a result of having been tested before
Repeated testing:
- Might cause people to improve, regardless of the treatment they received
- Might also cause performance to decline because of fatigue or boredom
Example 2: the psychological effects of cosmetic surgery
The surgery group improved over time while the comparison group declined → no testing threat
instrumentation threats in quasi-experiments
- A measurement could change over repeated uses, and this can be a threat to internal validity
- Having a comparison group helps detect instrumentation threats if there are any
why choose a quasi-experiment
- Real-world opportunities
- External validity: Quasi-experiments capitalize on real-world situations, even as they give up some control over internal validity
- Ethics: A researcher might choose a quasi-experimental design when the questions they have would be unethical to study in a true experiment
- Construct validity: Quasi-experiments show excellent construct validity for the quasi-independent variable
quasi-experiments and correlational studies
In quasi-experiments, the researchers tend to select their samples more intentionally than correlational studies
quasi-independent variables compared with participant variables
Quasi-independent variables focus less on individual differences and more on potential interventions such as laws, media exposure, or education
participant variable
a categorical variable, e.g., age, gender, ethnicity
small-n designs
Obtain a lot of information from just a few cases instead of a little information from a larger sample
disadvantages of small-n studies
Issues with internal validity
Issues with external validity
- Participants in small-n studies may not represent the general population very well
- Solution: compare a case study results to research using other methods
behavior-change studies in applied settings: three small-n designs
- stable-baseline design
- multiple-baseline design
- reversal design
stable-baseline design
a researcher observes behavior for an extended baseline period before beginning a treatment of other intervention
reversal design
Behavior high during baseline sessions and lower during treatment sessions
quality of science
replication, transparency, applicability to real-world context
replicable (or reproducible)
- part of interrogating statistical validity: size of the estimate (effect size), precision of estimate (95% CI)
- gives a study credibility
types of replication
- direct replication
- conceptual replication
- replication-plus-extension
direct replication
repeat an original study as closely as possible
concerns with direct replication
- threats to internal validity or flaws in construct validity in the original study would be repeated
- when successful, it confirms what we already learned
- does not test the theory in a new context
conceptual replication
same research question as the original study, but use different procedures
replication-plus-extension
replicate the original experiment and add variables to test additional questions
Why might a study not be replicable?
- issues with the replication study itself (differences in sample, materials, or geography)
- issues with the original studies
meta-analysis
mathematically averaging the results of all the studies (both published and unpublished) that have tested the same variables
strengths of meta-analysis
- can sort the studies into categories –> can identify new patterns in the literature –> could lead to new questions to investigate
- solves the File Drawer Problem by including both published and unpublished data
File Drawer Problem
studies that were never published could have important insights and add to the effect size
questionable research practices
- underreporting null findings
- HARKing
- p-HACKing
underreporting null findings
reporting only strong effects, not the weak ones
HARKing
Hypothesizing After the Results are Known
- misleads readers about the strength of the evidence
p-HACKing
running different types of statistics to find p < 0.5
transparent research practices
- open data
- open materials
- preregistration
open data
others can analyze the data
open materials
others can replicate the study
preregistration
scientists publish their study’s method, hypotheses, or statistical analyses before collecting data
ecological validity
extent to which a study’s task and manipulations are similar to the kinds of situations participants might encounter in their everyday lives
- one aspect of external validity