Reliability and Validity Flashcards
Random error
Unpredictable influences that vary from measurement to measurement
Influence usually goes in both directions
Threat to reliability
Systematic error
Biases that influence scores in a similar way across multiple measurements
Influence usually goes in one direction
Threat to validity
Types of reliability? (What do we mean by consistent)
Temporal Stability: are the results of the test stable over time?
Test-retest reliability
Inter-Rater Stability: are the results of the test stable across raters?
Inter-rater reliability
Internal Consistency: Are there correlations between test items that are supposedly measuring the same construct?
Split-half reliability
Cronbach’s alpha
Why is it important that tests are reliable and valid?
Major implication to people’s lives
- Career assessment day
- Carrie Buck case
- Indicted for “feeble-minded”
- Child taken away and forcibly sterilised
- Psychological profiling
Drawbacks (difficulties) to assessing temporal stability (test-rest reliability)?
- Practise effect
- Fatigue
- Cost
Inter-rater reliability?
Often used with subjectively scored measures
Test is scored by 2 or more raters, sets of scores are then correlated with each other
Internal consistency inclues
- Split half reliability
- Cronbach’s alpha
What is Cronbach’s alpha?
Equivalent to calculating the average of all possible split-half correlations
How to improve reliability?
- Control test administration
- standardized settings and instructions
- e.g. same number of hours after last cigarette
Increase the number of items
- Discriminability analysis
- Identify and remove items that don’t fit the results pattern (negatively impact your scale’s reliability)
Validity
Is it purporting to measure what it measures?
Are the conclusions drawn from measurement well-founded?
How free from systematic error is a measure?
Types of validity
Content Construct - Convergent - Discriminant Criterion - Concurrent - Predictive
What is content validity?
Degree to which the items or tasks adequately sample the
target domain
• i.e. how well does a measure/task represent all the facets of a construct
What is construct validity
How well do your operationalized variables
(independent and/or dependent) represent the
hypothetical or abstract variables of interest?
Two types: convergent and discriminant
What is criterion validity?
To what extent can a procedure be used to infer or predict some criterion (outcome)
Two types: concurrent and predictive
What is convergent validity?
Scale correlates highly with other scales that purports to measure the same thing