Q2 reliability & validity Flashcards
what is true of a good measure?
it assesses behavioral variability accurately
observed score =
true score + measurement error
true score
systematic variance
measurement error
error variance
where does measurement error come from? (2)
participant
not the participant
2 ways participants may introduce measurement errror
transient factors
stable attributes
transient factors
anxiety, tiredness (not stable)
stable attributes
differ across participants but are stable within a given individual; IQ, personality traits, general motivation
3 ways (aside from the participant) that measurement error can be introduced
situation factors
measure factors
mistakes
situation factors
where are participants taking a questionnaire? lighting, temperature, etc.
measure factors
questions aren’t clear or the questionnaire isn’t good
what kind of mistakes can contribute to error variance?
participant is given the wrong test, experimenter says something wrong, computer error, etc.
reliability
consistency and dependability in scores across time; undermined by measurement error
how do we estimate reliability?
via correlations between measures of the same attribute
satisfactory reliability coefficient
0.7+ or 70%
3 kinds of reliability
test-retest
inter-item
inter-rater
test-rest reliability
will you get the same score of behavior if you measure at two different times? wouldn’t test non-stable traits
inter-item reliability
ensures consistency among scale items aiming to measure the same construct (internal consistency)
make sure there are high enough correlations between similar questions
number used to assess inter-item reliability and satisfactory threshold
chronbach’s alpha >0.7
what does chronbach’s alpha measure?
inter-item reliability
systematic variance
how do we assess inter-rater reliability in nominal data?
kappa coefficient (chi-square)
0.6+ is reliable
how do we assess inter-rater reliability in ordinal data?
spearman’s r (0.7+)
kendall’s tau (0.45+)
how do we assess inter-rater reliability in interval/ratio data?
pearson’s r (0.7+)
how do we assess inter-rater reliability in 3+ raters’ data?
intraclass correlation coefficient (0.75+)
validity
are you actually measuring what you want to measure? accuracy
why is validity a concern in psychological research?
psychological constructs are not directly observable
construct validity
when multiple measures correlate with each other
convergent validity
finding measures that correlate with each other that should correlate with each other (happiness and positive affect)
discriminant validity
no correlation with unrelated measures (happiness and negative affect)
criterion-related validity
correlation between measure and relevant behavior; should have predictive power in behavior (SAT -> graduation rates, GPA)
concurrent criterion-related validity
correlation between measure and behavior at the current time
predictive criterion-related validity
correlation between measure and behavior at a future time
2 kinds of criterion-related validity
concurrent
predictive
how can we maximize reliability and validity?
provide specific operational definitions (precise, appropriate)
no ambiguity on how you measure your variable (backed up by the literature)
standardize procedures (scripts, etc.)
inclusion and assessment of subject variables (sex, age, etc.)
random and/or standardized sampling to wash out potential systematic individual differences between groups