Scientific Methods Flashcards
Pre-Scientific Methods?
Astrology
- Personality assessment based on birth date
Physiognomy
- Personality assessment baed on shape of body, particularly the face
Phrenology
- Personality assessment based on morphology (shape) of skull
4 major types of descriptive methods?
LOTS of data!
Life history data
Observer-reports
Test data
Self-reports (surveys)
Self-Report?
Asking people questions about their beliefs and behaviours
- Provided by the participant
- Responses to questionnaires
Ten-Item Personality Inventory-(TIPI)?
I see myself as: (rate on a likert scale)
1. _____ Extraverted, enthusiastic
2. _____ Critical, quarrelsome.
3. _____ Dependable, self-disciplined.
4. _____ Anxious, easily upset.
5. _____ Open to new experiences, complex.
6. _____ Reserved, quiet.
7. _____ Sympathetic, warm.
8. _____ Disorganized, careless.
9. _____ Calm, emotionally stable.
10. _____ Conventional, uncreative.
Self-Report Data, advantages and disadvantages?
Advantages
– Allows study of difficult-to-observe behaviors,
thoughts and feelings
* Who knows better?
– Easy to distribute to large groups
Disadvantages
– Respondents may not be representative
(convenience sampling is tempting)
– Responses may be biased or untruthful
Observer report?
Observing behaviour of others
Observer Reports:
Who are the Observers?
- Parents, friends, teachers
– Usually collected by questionnaire or rating
form - Trained observers
– Systematic observations of behavior - Untrained, participant-observers
– Class ratings of Trudeau
Observer report, advantages and disadvantages?
Advantages
– Capture spontaneous behaviors
– Avoid bias of self-reports
Disadvantages
– Researcher interference
* How naturalistic (vs. artificial) is the observation?
– Rarity of some behaviors
* Research on criminality
– Observer bias & selective attention
– Time consuming
Test Data?
Assessing an individual’s
abilities, cognitions, motivations,
or behaviors, by observing their
performance in a test situation
Tests may be written,
physical (e.g.,
cardiogram),
experimental, or
physiological
Examples of Kinds of Test Data?
Questionnaire tests
– E.g., IQ
Experimental tests
– Megargee (1969) study of dominance
* Does trait dominance (high vs. low) or gender
predict leadership?
* Paired high and low dominant men and women in
“box repair” task
* 4 kinds of groups:
(1) high dom ♀, high dom ♂
(2) high dom ♀, low dom ♂
(3) low dom ♀, low dom ♂
(4) low dom ♀, high dom ♂
Test data, advantages and disadvantages?
*Advantages
* Allows measurement of characteristics
not easily observable, or known to the
participant
Disadvantages
* Must infer that the test measures what
you think it measures
– Validity issue
Life history (Case studies)
Intensive examination
of a single person or
group
Case Study Methods?
- Obtained from life history (interviews,
autobiography) - Other life records (Life Outcome Data)
– School grades
– Criminal records
– Work record
– Facebook page, tweets, instagram, etc.
Case study, advantages and disadvantages?
- Advantages
– Rich source of hypotheses
– Allows for studies of rare behaviors - Disadvantages
– Observer bias
– Difficult to generalize (N = 1)
– Difficult to reconstruct causes from complexity
of past events
Reliability?
Extent to which scores on the measure
are stable and replicable, vs. amount of
error or randomness in the measure
Validity?
- Degree to which measure assesses what it is
supposed to assess
Bulls eye analogy?
- Reliability = are you hitting the same spot
each time? - Validity = are you actually hitting the bulls
eye?
Measuring Validity (4)?
- Face validity
– Does it measure what you think it measures? - E.g., shyness questionnaire
- Predictive validity
– Does it predict an external criterion? - Does shuffling predict self-reported shyness?
- Convergent validity
– Relation to other measures of same variable - Self-report and observer report should be related
- Construct validity
– All of the above
Inter-rater Reliability and Validity
An Example: Measuring Height (without a ruler)
- Reliability of ratings of height
– Average correlation between two judges = .76
– Reliability of 5 judges is about .90 - Validity can only be high if reliability is high
– If measures are more reliable, they provide a more valid
assessment!
– By combining the judgments of multiple people (or using
multiple items on a personality test) we can get fairly reliable and valid measures of personality (reliabilitiesnabout .80-.90)
The Problem:
Incentives Structure?
- Published research is important for
getting a job, getting tenure, getting
grants, and being viewed favorably in
our field - Result: scientists try to publish as
much as they can - Balancing act: need to stay truthful to
psychological science, but also
publish - This results is researchers taking
shortcuts and sometimes worse…
Questionable Research Practices
(QRPs)?
Decisions in design, analysis, and
reporting that increase the likelihood of
achieving a positive result
– And a positive response from editors and
reviewers
What should researchers do to avoid QRPs?
- Increase disclosure in methods,
results, and hypothesis presentation - Pre-register hypotheses and studies
– Data collection rules, analytic strategies - Share data
- Be a responsible scientist regardless of
outcome
Center for Open Science?
Founded to increase to openness,
integrity, and reproducibility of scientific research
– Brian Nosek and Jeff Spies
* Open source software platform for
pre-registering hypotheses,
archiving study materials,
depositing data and syntax
Good Research?
- Good research is open research
– Materials and data are shared publicly - Good research features experimental
methods that are strong and isolate a
question of interest - Good research is adequately
“powered” research
Power?
- Most psychological effects are small, so
you need a lot of participants
– Some say “around 200”; others say it depends
on what you’re studying and your design
– If you’re studying an effect that’s likely to be
small, you need a big sample - E.g., Are UBC or SFU students more liberal?
– If you’re studying an effect that’s likely to be
big, a smaller sample is ok - E.g., Are UBC students or Texan oil tycoons more
liberal?
What does it mean that Power is generally set at 80%?
– This means there’s an 80% chance of
finding an effect that exists
– However, studies are often run with much
lower power
* Researchers underestimate how much data
are needed
* Effects are smaller than they think
* It’s hard and expensive to collect large
samples