Final Flashcards
three components of ethics
- Human participants
- Non-human subjects
- Academic integrity
nuremberg code
Ten core ethical principles that drive research laws in many countries
belmont report
Three main principles for ethical decision making
belmont report
Three main principles for ethical decision making:
1. Respect for persons
2. Beneficence
3. Justice
Why are the ethical principles broad rather than specific?
Meant to represent core values that serve as the basis for developing specific rules that can be applied & refined at different times & for different situations
what shapes how ethical guidelines evolve?
-Changes in ethical codes by groups/nations/orgs
-Changes in federal laws
-Public input in response to proposed changes
-Public demand for changes
who are interest holders?
Who is potentially affected by the decision
beneficence
-“Doing good”
-Researchers must strive to protect participants from harm (Study designs should minimize risk; Procedures should assess risk & harm)
-Researchers must consider how the community will be helped or harmed (Consider the costs of NOT doing the research as well)
justice
-Research should involve a fair distribution of harms & benefits across different types of people
-Fair balance between people that participate & people that benefit
-Are the participants representative of those who stand to benefit?
-Ensure that participant recruitment is inclusive
respect for persons
People should be free to decide whether to participate
informed consent
Potential participants have the right to learn about the research & its potential costs/benefits before deciding whether to participate
special protections for vulnerable pops:
-Children
-People with intellectual & developmental disabilities
-Prisoners
special protections for children
Children are given the opportunity to leave studies at any time (provide assent)
special protections for prisoners
-Power dynamics involving race (institutional systematic racism)
-Coercion & restriction of freedom
-Drug dependence/addiction & mental illness are disproportionately high (Some studies could exacerbate these)
research with animal subjects (the three Rs)
-Replacement
-Refinement
-Reduction
replacement (animals)
Find alternatives to using animals when possible
refinement (animals)
How could research procedures & aspects of animal care be altered to reduce animal distress?
reduction
Use the fewest number of animals as possible
types of deception
-Omissions (Withholding details of a study from participants)
-Commission (Lying to participants)
when can deception be used?
When it is justified & there is no alternative
when must debriefing be done?
-When deception is used
-When scientists feel a responsibility to explain their study to people (even without deception)
what must debriefing include?
-Must include description of & rationale for deception
-Must “correct” any false information given
institutional review boards (IRBs)
decide if a research practice is unethical
IRBs are composed of:
-At least five members of varying backgrounds
-At least one scientist
-At least one non-scientist
-At least one community representative
how are IRB members selected?
-Often volunteer
-Community members might see notices (soliciting)
types of IRB review
-Exempt
-Expedited
-Full review
IRB exempt review
Very little risk
Includes research conducted in education settings for educational purposes, archival studies, or where there are no risks to people
IRB expedited review
Minimal risk
No potentially risky manipulations or invasive procedures
Little to no emotional impact
IRB full review
Manipulations, special populations, invasive procedures, high risk studies, deception
three requirements of causation:
- Covariance (Are the variables systematically related?)
- Temporal precedence (Does a change in one variable always come before a change in the other variable?)
- Internal validity (Are alternative explanations sufficiently ruled out?)
maturation
Experimental group changes over time, but only because of natural development or spontaneous improvement/decline
history (internal validity threat)
Experimental group changes over time because of an external factor or event that affects most members of the group
how to rule out maturation & history threats?
Using the right comparison groups
types of comparison groups
-Control
-Placebo
-Treatment
-Wait list control
control group
Level of an IV representing a neutral condition
placebo group
Control group that is exposed to a fake or inactive treatment
wait list control group
A control group that receives the same treatment/intervention as the treatment group, but not until after the treatment group
-Allows for isolation of the IV & comparison of groups
-Allows for individuals in control groups to obtain treatment
control variables
A variable that an experimenter holds constant on purpose
confound
An alternative explanation
design confound
A second variable that happens to vary systematically along with the intended IV
two main types of experimental designs
-Between-subjects design
-Within-subjects design
between-subjects design
Different groups of participants are placed into different levels of the IV (only experience one level/condition of the IV)
posttest only design (between)
Random assignment to a group => Treatment/IV applied => DV measured
pretest/posttest design (between)
Random assignment to a group => DV measured => Treatment/IV applied => DV measured again
when may a pretest/posttest design be used?
If you want to see how large the improvement/decline is (can track change over time)
selection effects
The kinds of participants in one level of the IV are systematically different from those in the other
way to avoid selection effects:
Matched groups
matched groups
Participants in different conditions are matched on an extraneous variable
extraneous variable
A variable that is not the IV, but has the potential to affect the DV
within-subjects design
Each participant is presented with all levels of the IV
concurrent measures design (within)
Participants exposed to all levels of the IV & DV measured, all around the same time
repeated measures design (within)
Participants are measured on a DV more than once, after exposure to each level of the IV
order effects
Being exposed to one condition change how participants react to another condition (“carryover effects”)
testing effects
Type of order effect in which an experimental group changes over time due to repeated testing affecting participants
Fatigue or practice
fatigue effects
Type of testing effect in which participants get tired/bored over time
practice effects
Type of testing effect in which participants get better at the task over time
how to avoid order effects:
counterbalancing
counterbalancing
Levels of the IV are presented to participants in different sequences
full counterbalancing
All possible condition orders are tested
partial counterbalancing
Only some possible condition orders are tested
If many IVs/levels, a limited sample size, the nature of the experiment doesn’t allow certain orders to occur
demand characteristics
Participants guess what the study’s purpose is & change their behavior in the expected direction (most likely in within-subjects designs)
within-subjects pros
-Participants in groups are equivalent & serve as their own controls
-Require fewer participants than between-groups design
within-subjects cons
-Potential for order effects
-Greater potential for demand characteristics
-Might not be practical or possible
regression to the mean
When a group avg. is unusually extreme at pre-test, it is likely to be less extreme when measured again
how to identify a regression to the mean issue:
Treatment group is more extreme, but ends up at same score as comparison group
attrition threat
A certain type of participant systematically drops out of the study before it ends
how to avoid attrition threats:
If a participant drops out, all of their scores should be removed
Some statistical techniques can help prepare for this if the sample size is small
instrumentation
An experimental group changes over time, but only because the measurement instrument has changed
observer bias
Researchers’ expectations influence their interpretation of results
placebo effect
Participants in an experimental group improve, but only because they believe in the efficacy of the therapy/drug they receive
how to solve a placebo effect:
double-blind placebo control study => Neither the experimenter nor the participants know which group they have been assigned to
what could a null effect mean?
true negative OR false negative
positive results bias
Authors are more likely to submit and/or editors are more likely to accept positive results (rather than negative or inconclusive)
low between-groups variation
-Weak manipulation
-Insensitive measure
-Ceiling effect
-Floor effect
-Design confound acting in reverse
weak manipulation
Manipulation not strong enough to show differences between groups (may cause a ceiling effect)
insensitive measure
DV isn’t sensitive enough to detect differences (may cause a floor effect)
ceiling effect
Participants in an experimental group score almost the same on a DV, with all scores on the high end of the dist. (usually caused by a weak manipulation)
floor effect
Participants in an experimental group score almost the same on a DV, with all scores on the low end of the dist. (usually caused by an insensitive measure)
design confound acting in reverse (example)
Maybe Ps who had 7 beers all took a cup of coffee in the waiting room, which could be why they did equally well on the memory test
empirical approaches to testing construct validity
-Pilot studies
-Manipulation checks
pilot study
A simple study conducted before or after conducting the study to test the effectiveness of a manipulation
questions that pilot studies consider:
-Do participants understand the instructions?
-Do participants become bored or frustrated?
-Can participants guess the research question or hypothesis?
-Are there demand characteristics?
-How long does the procedure take?
-Are computer programs or other automated -procedures working properly?
-Is data being recorded correctly?
manipulation check
An extra DV researchers include in order to determine how well a manipulation worked
-Can show if there isn’t enough variability btwn levels
-Can show an ineffective IV manipulation
-Usually done at the END of a procedure
high within-groups variation
Aka noise, error variance, unsystematic variance
More overlap between groups => Smaller effect sizes => Less likely the groups are significantly different
measurement error
The degree to which the recorded DV for a participant differs from the true value of the DV
how to minimize measurement errors:
-Use reliable measures
-Use measures with high construct validity
-Increase sample size
examples of high within-groups variation:
-Measurement error
-Individual differences
-Situation noise
how to minimize individual differences:
-Within-groups design => Controls for irrelevant individual differences
-Increase sample size
situation noise
Unrelated events or distractions in the external environment that create unsystematic variability within groups
how to minimize situation noise:
Carefully control environmental factors that could influence DV
minimizing within-groups variability increases:
a study’s STATISTICAL power
external validity of causal claims
-Internal validity often prioritized over external validity
-Increasing the variability of participants may increase external validity, but decrease statistical power
-The downside of controlling for every extraneous variable, decreasing all situational noise, etc. => Less generalizability
factorial designs
have 2 or more IVs
why use 2 IVs?
-Can show differences between conditions
-Can show whether the effect of one IV depends on another IV
-Test for a “difference in differences”
-Tests for main effects of IVs & interactions between IVs
-Testing for moderating variables
main effect
The overall effect of one IV on a DV, averaging over the levels of the other IV
interaction effect
An effect in which the difference in levels of one IV changes depending on the level of the other IV
moderator
A variable that, depending on its level, changes the relationship between two other variables
why use ANOVA?
-Tests for main effects & interactions
-Conducting multiple t-tests increases the chances of Type I error
effects associated with a 2x2 factorial design:
- Main effect for IV 1
- Main effect for IV 2
- Interaction effect between IV 1 & IV 2
describing factorial designs (_____ x _____)
-Each # refers to the number of levels of an IV
-Amount of #s is the # of IVs
-Product of the #s is the total number of conditions
between-subjects factorial
-Both IVs are manipulated between-subjects
-Each P/S only experiences one condition
repeated-measures factorial (within)
-Both IVs are manipulated within-subjects
-Each P/S experiences every condition
potential issues with a repeated-measures/within-subjects factorial:
-Order effects
-Demand characteristics
=> Counterbalancing can solve
mixed factorial
One IV is manipulated between-subjects, and the other IV is manipulated within-subjects
advantages & disadvantages of mixed factorials:
Are we concerned about some of the issues (related to between-subjects or within-subjects manipulations) for one IV, but not the other?
marginal means
The averages for each level of an IV, averaging over the levels of the other IV
test used to understand interaction effects:
post-hoc tests (test pairwise comparisons)
statistical tests
ANOVA results tell us if the main effects & interactions are significant
“Post-hoc pairwise comparisons” are needed to understand the nature of the interaction
cross-over interaction
One IV has an opposite effect at one level of the second IV than at the other level of the second IV
Appearance:
-“X” on a scatterplot
-Mirrored on a bar graph
spreading interaction
There is an effect of one IV at one level of the other IV; Weak or no effect of the IV at the other level of the other IV
Appearance:
-“<” on scatterplot’
-Difference vs. no/small difference on a bar graph
looking for interactions in bar graphs
-Look for a difference in differences between bars
-Imagine drawing a line to connect the tops of bars in the same condition
looking for interactions in line graphs
-If lines are not parallel, there may be a significant interaction (But would need to formally test using statistics)
-Lines do not have to cross for an interaction to be significant
hypothetico-deductive method
-Do we care about the results of any single study?
-Can a single study really tell us what is going on in the real world?
-How do we interpret results of conflicting studies?
single study
-Statistical significance?
-Finding likely not due to chance
-High probability to repeat
types of replication
-Direct
-Conceptual
-Plus extension
direct replication
Original study is repeated as closely as possible; Uses the same operationalization of the conceptual variables (to reproduce the original study & determine whether the effect is repeated)
conceptual replication
Relationship between conceptual variables in the original study is tested using different procedures for operationalizing those variables
replication plus extension
Relationship between variables in original study is tested & additional variables (or conditions) are added to test additional questions
why might a replication attempt fail?
-Problems with the original study
—Found by chance
—Small sample size
—Questionable research practices in original study
-Problem with the replication
—Only one replication attempt per study (found result by chance)
-Differences between original study & “replication”
—Differences in context
—Different operationalization of conceptual variable shows a different pattern
types of research misconduct:
-Data fabrication
-Data falsification
-Plagiarism
-P-Hacking
-Hark-ing
data fabrication
Researchers invent data that fit their hypotheses
data falsification
Researchers selectively delete observations or influence participants to act in a particular way
plagiarism
Representing the ideas or words of others as one’s own
p-hacking
A family of questionable data analysis techniques which can lead to non-replicable results
Ex) In order to obtain a p value of just under 0.05, researchers add participants after the results are initially analyzed, look for outliers to exclude, or try new analyses
HARK-ing
Hypothesizing after the results are known
open science
The practice of sharing data & materials freely so others can collaborate, use, & verify results
open materials
Providing all of a study’s measures & manipulations
open data
Providing the full dataset
preregistration
The practice of posting a study’s method, hypotheses, or statistical analyses publicly, in advance of data collection
Peer-review proposals & commit to publishing results regardless of outcomes
importance of meta-analyses
-Average all effect sizes to calculate an overall effect size
-Can categorize studies into groups to detect patterns in the literature
-Can’t solve the replication crisis (Publication bias against - results => Meta-analysis may not show the whole picture)
open science collaboration (OSC)
-Selected 100 studies from three major psych journals
-Recruited researchers to conduct direct replications
-Used several metrics to judge where replication was successful
=> 39% of studies were replicated (“replicable”)
Many Labs Project (MLP)
Conducted up to 36 replications of each study
=> 85% of studies were replicated
generalizability (external validity)
Replication with extension & conceptual replication are critical because they address generalizability
To assess => Ask how participants were obtained
Experiments don’t automatically have low external validity
theory-testing mode
Testing claims to investigate support for a theory
-Is there an association or causal relationship between variables?
-Internal validity > External validity
generalization mode
Investigating whether claims generalize to other populations/settings
(Survey research to support frequency claims is always done in generalization mode)
misconceptions about generalizability
-Studies with larger sample sizes automatically have greater generalizability
-Experiments always have poor generalizability
-If a sample includes certain types of individuals, findings generalize to that population of individuals (Still matters how the sample was collected)
experimental realism
The extent to which a laboratory experiment is designed so that participants experience authentic emotions, motivations, & behaviors
ecological validity (mundane realism)
The extent to which the tasks & manipulations of a study are similar to real-world contexts