W5 Peer Review, Replication, Empathy Flashcards

Question 1

Q

What are the steps of peer review?

Answer

A

Step 1 = read by an editor, does it suit the journal, does it impact an area of study, is it exciting/influential?

Step 2 = send to 2-3 expert anonymous reviewers, most journals allow reviewers to see who the writers are, they make one of the following decisions

REJECT it for the journal
INVITE A REVISION, changes requested by the reviewers, sometimes need to change major parts of the experiment

Step 3 = authors might revise the paper in line with the reviewers requests, or argue why the changes are not necessary

Step 4 = if changes are made, another peer review is done, can reject, revise or accept the paper, sometimes happens

Question 2

Q

What are the challenges of going through peer review?

Answer

A

Often multiple iterations of peer review
Each round of peer review can take several months, eg. 2-3 months
Often have new deadlines to revise the manuscript to be accepted as a revised manuscript

Question 3

Q

Why is peer review important? (3)

what is the counter-argument to this?

Answer

A

Papers of poor quality are excluded
Made up of multiple experts and editors to be as objective as possible
Helpful and constructive to improve the manuscript

Other alternatives to peer review
Some argue EVERYTHING in raw form should be published, no gatekeeping or quality control, it will ‘self-correct’

Question 4

Q

What are the cons to only being able to access things as student/researchers vs. open. access?

(Scientific journals give you the choice between paying for your research or doing open access)

Answer

A

general public gets excluded to scientific literature, excludes people from learning about psych, medical knowledge, pressure to only publish in mainstream journals and not start your own
OR
authors have to pay $$$ which may limit publications from less financed institutions and teams, causes biases in scientific literature / rich / western

Question 5

Q

What’s a solution for researcher vs. open access?

Answer

A

Hybrid approach for limited papers to be paid / accessible OR
Better science communication / transferred into a blog

Question 6

Q

What were the findings of the Reproducibility project?

Called replication crisis in psychology

Answer

A

studied how well does effects replicate DIRECTLY

Of the 96% of original studies were stat. sig, ONLY 36% of replications were statistically significant

MEAN EFFECT SIZE of originals WAS HALF IN REPLICATIONS

Question 7

Q

What is direct vs. conceptual replication?

Answer

A

Direct replication = method is repeated as closely as possible
Conceptual replication = different methods are used to test the same hypothesis

Question 8

Q

What are the 4 reasons why a replicant study might fail to obtain the same results as the original?

Answer

A

Unidentified contextual effects
Unidentified individual differences = eg. participants in first study might have all had higher levels of something, and replicant study did not, but individual differences weren’t controlled for initially
Original could be TYPE 1 ERROR, “false positive”, 5% of the time, 1 / 20 tests
Poor research practices, eg. no Power calculations

Question 9

Q

Type 1 vs Type 2 error, which is more serious?

Answer

A

Type 1 Error = Null hypothesis was true, we reject the null hypothesis “false positive”, we say there’s an effect even there’s not a real effect
More serious, false alarm/positive
Type 2 Error = Null hypothesis is not true, BUT retain null hypothesis “false negative”, there’s an effect but we say there’s no effect miss

Question 10

Q

How to avoid the ‘fishing expedition’ in research?

Answer

A

Decide a-priori about what you want to look for, otherwise you go on “fishing expedition” for statistically significant results
Eg. without uncorrected multiple comparisons, you can attribute noise in machines to neural activity in salmon

Question 11

Q

If we changed alpha level from .05 to .01 what errors would increase/decrease?

Answer

A

Type 1 error would decrease from 5% chance to 1% chance

Type 2 error would increase (saying there is no effect when there is an effect)

Question 12

Q

What would be the consequences of using an alpha level 0.1?

Answer

A

If alpha level decreased, things that might be valid would be rejected and not published, and more participants would be needed to produce the same results - need to increase SENSITIVITY to really determine whether an effect is there or not

Question 13

Q

What is power?

What are the 2 things Power is affected by?

Answer

A

the likelihood of correctly rejecting the null hypothesis, of correctly detecting a significant effect statistically in a sample, if the effect is there in the population

Power is affected by number of PEOPLE and number of TRIALS

Question 14

Q

How does low Power increase Type 1 and 2 errors?

Answer

A

Increase in chance of random error variance in a person in the sample which affects the mean
Small sample sizes lead to both type 1 and type 2 statistical errors - commonly driven by outliers

5-10 years recently it has become the norm to conduct power analyses, and many older studies becomes harder to replicate because they were ‘underpowered’, eg. low sample size

Question 15

Q

What do you need to consider in high sample sizes?

Answer

A

Larger sample sizes minimises the impact of outliers and increases the generalisability BUT you NEED to consider effect size, since a minute difference in groups might produce a significant result from HIGH Power, but the magnitude of the effect might be very minute

Question 16

Q

Is it justified to use the same amount of participants as previous studies?

Answer

A

No - better to do a Power analysis to figure out what is the minimum sample size to get decent power

This is limited by logistics and replication studies

Question 17

Q

After Power analysis, what do we do?

Answer

A

Have a firm ‘stopping rule’ for recruiting participants regardless of statistical significance

If you check for significance every participant or small subset, and stop once you’ve reached it, this can inflate type 1 error rate “false positives”

check if the task/performance was appropriately difficult, eg. avoiding ceiling and floor effects

Question 18

Q

How do you determine a formal stopping rule? and can we always test this?

Answer

A

run a formal power analysis (can inform how many people you should study)
No, its unfeasible to do power analysis in some complex designs, otherwise need to justify why you didn’t do one

Question 19

Q

Why are multiple experiments in a study good?

Answer

A

Try to replicate differences that you have already observed, because there is a 5% chance that the differences are JUST DUE TO CHANCE

Multiple experiments within a manuscript, eg. 2 experiments that find the same result = 5% * 5% chance of false positive = 0.25% of false positive

Question 20

Q

What 3 things are needed for transparency and what happens if we don’t include it?

Answer

A

Describe all the decisions made
Include both insignificant and significant variables
Share de-identified raw research data ‘open science framework’ for others to replicate or check your data

Distorted sense of scientific literature
Published papers more likely to have type 1 errors, and there might be more findings showing no effect and only one showing an effect
Valuable knowledge lost

Question 21

Q

What is Pre-Registration? / helps reproducibility

Answer

A

Formally committing to predictions, stopping participant recruitment, treating data
Submitted to public repositories PRIOR TO THE STUDY

Question 22

Q

Pros / Cons of pre-registration?

Answer

A

PRO - Promotes transparency and reduces researcher degrees of freedom
PRO - More useful for voluminous and complex variables like fmri voxel activations
PRO - better for non controversial studies, exploratory studies where the effect is not predicted

CON- makes researcher decisions appear immune to critique, it’s made in advance but not peer reviewed, gives a bit of immunity to decisions that might not be that justified

CON - Limited utility for simple designs, eg. don’t need to screen for accuracy

Question 23

Q

Why is Reaction time screening used and what are the subtypes?

Answer

A

identifying potential outlier reaction times, eg. long reaction times from error variance

Absolute = Screening from an absolute value = exclude any RTs after 2000 ms,

Relative = relative to participants mean reaction time, screen anything 3 s.d above mean reaction time

Question 24

Q

Pre-reg alternatives

What is the registered report format and what is it used for?

Answer

A

its a peer-reviewed plan for study, might be modified, ONCE IT’S ACCEPTED it guarantees to publish paper, REGARDLESS OF THE SIGNIFICANCE OF THE REPORT
Used for hotly debated topics - gives protection for you as a researcher AND more established literature

Question 25

Q

Pre-reg alternatives

What is Replication Experiment 1 with Experiment 2, pros and cons?

Answer

A

uses an existing framework, decreases type 1 error rate, goes from 5% to 0.25%, but the methodological flaws may simply be repeated

Question 26

Q

Pre-reg alternatives

What is the Multiverse approach?

Answer

A

measure all of the effect of the data decisions on outcomes and report on the results so readers can see how they affect the outcome, reveals potential confounding effects, explore ALL degrees of freedom

Question 27

Q

What is the counterargument to Pre-reg alternatives?

Answer

A

Stronger theoretical basis to psychological science- we won’t need pre registration!

Question 28

Q

Why can you get a null result? and what is the significance of a null result?

Answer

A

There isn’t an effect/relationship
There is an effect but you’ve failed to detect to detect it (Type 2 Error)

Null effects are still meaningful, and they can be published - you still get a green tick for detecting a non-relationship when it isn’t there

Question 29

Q

If you get a null result, what 5 steps should you take next?

Answer

A

need to figure out whether there really isn’t an effect or just type 2 error
Do multiple experiments showing the same null result
Show alternative divergent evidence, eg. a and b show no interaction but a and c show an interaction under the same conditions of a and b
Have 2 mains effects to rule out any alternate explanations that the variables weren’t manipulated properly, can support the absence of an interaction instead of alternative explanations
Have high Power and reliability

Question 30

Q

Reliability =
Test retest =
Internal consistency =
Inter-rater reliability =

Answer

A

consistency of a measurement over time
reliability over time
reliability across items
reliability across researchers

Question 31

Q

What is Validity?

Answer

A

which is about whether a particular operationalization truly captures the intended psychological process

Question 32

Q

Relationship between reliability and validity in sufficient/necessary conditions?

Answer

A

Reliability is often said to be a necessary but NOT sufficient condition for validity, it is necessary to the extent that the underlying psych process will be stable over time

For dynamic processes, it is harder to determine reliability

Question 33

Q

A = consistent dots but not near the target?
B = dots are all over the place
C = dispersion of points away from target
D = dots consistently in the target circle

Answer

A

A = reliable, not valid
B = broadly valid dots where average would be close to psychological target construct, not reliable

C = not reliable but now average does not align with psych target construct, not valid
D = valid and reliable

Question 34

Q

Example of something reliable but not valid for attentional control?

Answer

A

IV Measurement: height

Question 35

Q

Is IV measurement: difference in RT between Stroop trials a RELIABLE MEASURE?

How does this affect validity?

Answer

A

not reliable since the effects are so similar between participants, as rank order in participants naturally shuffles around

Only partially valid measure, as reliability is an important prerequisite for a valid measure

Question 36

Q

Is IV measurement: BOLD response in frontoparietal region for attention task a RELIABLE MEASURE?

Answer

A

Reliable measure that stays same over time
Valid measure related to brain region on attentional control

Question 37

Q

A highly concentrated graph with most scores the same is?

Rank ordering spread across x-axis is?

Answer

A

showing an effect at the group level
showing rank order reliability, more spread, weaker effect

Question 38

Q

What is rank order reliability?

Pros / cons

Answer

A

how well a measure is able to rank-order individuals within the sample, eg. person A scores lowest at T1 ALSO scores the lowest at T2

pros = Shows the robustness and replicability of an effect

pros = Reliable measurement at the INDIVIDUAL level

cons = does not guarantee a robust or replicable effect at the GROUP level
cons = Individual and group level reliability can conflict

Question 39

Q

Why is there Tension between group level effects and reliable individual differences/consistent rank orders?

Answer

A

Because group level effects assume most individuals experience similar, large, extent

Eg. Stroop effect Assumes minimal between participant variation in Stroop effect, bc barely any participant variability in the stroop effect

Problematic = since many tasks that produce strong group level effects are not robust at the individual level because they are so robust only at the group level

Need to care about reliability in individual differences regardless of the research type

Question 40

Q

How to get good rank order individual reliability?

Answer

A

People to experience the effect at distinctly different levels consistently over time

Question 41

Q

What are the FOUR MEASURES to quantify reliability?

Answer

A

Cronbach’s alpha = internal consistency between items, items need to measure the same construct, are responses to items consistent in measuring a construct?
Split half correlation = 100 trials, split into 2 x 50 trials and calculate the correlation between them, should have consistency/reliability and be correlated, but error variability occurs depending on which trials are split and then contrast, do multiple split half correlations
Test re-test correlation = how consistency scores are in the same test across 2 time points
Intraclass correlation coefficient = used consistency of measurement in clustering data

Question 42

Q

Why is reliability important for measurement?

What is it most used for?

Answer

A

Correlation between A and B = the maximum correlation between A and B that you can detect is funda-mentally constrained by the measurement reliability of variable A and variable B

low reliability = might show insignificant result when there really is an effect (Type 2 error / false negative)

questionnaire based measurement in reliability coefficient

Question 43

Q

How much power needed to adequately detect an effect?

What is issue for participants?

Answer

A

minimum of 80% power

Reliability of both measures has a HUGE impact on the number of participants NEEDED to sufficiently detect an effect
Most studies have not had hundreds of participants, and thus probably don’t have much power, “underpowered” and will struggle to find an effect

Question 44

Q

What are the 2 consequences for having tension between group and individual effects?

Answer

A

If one study has high reliability, but replication has low reliability, it can lead to a failure to replicate the findings
Opposite, if original study had bad reliability and found no effect, makes it not replicable for follow up studies

Question 45

Q

What are the 3 ways to improve measurement reliability?

Answer

A

More measurement / trials = greater measurement reliability
consider different dependent variables -
Reduce the intensity of the manipulations, since there is an inherent tension between group and individual effects, group level effects have lower reliability at individual level and rank ordering

Reducing manipulating might help reveal individual differences in the data, eg. emotional induced blindness to dial down down emotional stimuli to reveal more distinct differences in emotional response in individuals

Question 46

Q

Recap: affective vs. cognitive empathy?

Answer

A

Affective empathy - feeling what someone else is feeling, aware of source of emotion, feel distressed when someone is upset
Cognitive empathy - being able to understand someone’s thoughts, feelings, beliefs, especially when they differ from your own mentalizing / theory of mind

Both interact for social-emotional functioning

Question 47

Q

What are the findings for Neural correlates in affective vs. cog empathy?

Answer

A

Damage to inferior frontal gyrus have impaired AFFECTIVE empathy but intact COGNITIVE empathy

Damage to ventromedial prefrontal cortex have impaired COGNITIVE empathy but intact AFFECTIVE empathy
double dissociation

Question 48

Q

What is an alternative explanation if they only found that the ventromedial prefrontal cortex has impaired COGNITIVE empathy but intact AFFECTIVE empathy?

Answer

A

CE is more effortful for most people, so the single dissociation could be because CE is easier to damage as it is a more complex and cognitively demanding process

Question 49

Q

What 2 contrasting things activates the Temporoparietal junction (TPJ) and why?

Answer

A

cognitive empathy/TOM tasks AND invalid trials in Posner cueing paradigm

TPJ is important in the VENTRAL attentional network “Circuit breaker” between what we were doing and engage in something else, BOTTOM UP ATTENTION

The link between CE/TOM ability and doing well in invalid trials might reflect a higher ability to disengage from a salient stimuli (the self / an invalid cue), shift mentality / attention towards something different

Question 50

Q

RECAP: 2 main pathways for attention dorsal vs. ventral pathways

Answer

A

Dorsal (top): frontal eye fields, intraparietal sulcus
Involved in voluntary / top down / goal directed attention, for tasks and concentration
Ventral (bottom): ventral frontal cortex and temporoparietal junction
Involved in bottom up / salient and unexpected stimuli, as a circuit breaker to notice exogenous things in environment