Reliability and Validity Flashcards
The two components of measurement
measurement = true score + error
(we want as close to true score as possible)
Ways to Reduce Error
Many participants
- there are individual differences error but when it is averaged out among a large group of people it becomes manageable
Many measurements
- to manage measurement error we add more than one measurement that can achieve similar effects
High frequency or many occasions
- allowing us to be more confident that what we have measured is true
Reliability
refers to the consistency/repeatability of the results of a measurement
- ‘How’ reliable a measure is is relative and depends on the situation
Types of Reliability
Observers: Inter-Observer reliability
Observations: Internal (Split-half) reliability
Occasions: Test-retest reliability
Inter-Observer Reliability
→ the degree to which observers agree upon an observation or judgement
- It can be based on frequency or categorical judgements
- measured with correlations
- raters do not rate research together
Example of Poor Inter-Observer Reliability
Rating attractiveness
- Lacks commonality among different raters and thus there is not strong inter-rater reliability
Internal/Split-Half Reliability
→ the degree to which all of the specific items or observations in a multiple item measure behave the same way
- High internal reliability shows the entire measure is consistently measuring what it should be
How:
- divide tests in two halves, then compare first half to second half
- determine consistency
Example of Measuring Split-Half Reliability
Intelligence (split into three domains)
- Verbal intelligence, perceptual reasoning, working memory
- If we split this in half, the first 50 questions and the last 50 questions, the questions have to be of equal comparison (relating to equal amounts from each domain) (compare ‘like with like’
- Then we look at the scores from each half and if there is high correlation in results then this indicates good internal reliability
Test-Retest Reliability
→ the reliability of a measure to produce the same results at different points in time or occasions
- important to show that the test or measure consistently measures the construct we are interested in, provided no other variables have changed
Visual Search Task Example (Test-Retest Reliability)
We need the measurement to remain constant over time
However, Practice effects undermine test-retest reliability
- to counteract this we should counterbalance the order of presentation, such as randomly assigning people to differ orders
Brain Training Example (Test-Retest Reliability)
Things to improve brain, slow down cognitive decline
There is the question of whether it works
- Adrian Owen (2010) found there to be some improvement, however there was no evidence found for transfer effects to untrained tasks, only applied to that one task (where it should work for other tasks if it improved brain performance, thus it is the effect of practice)
practice effects
improvement on scores in a tasks does not correlate to greater improvement on all tasks
- indicator of poor retest reliability, incomparable between first time and second time
replication
reliability of results across experiments
- when variables and conditions stay the same
- the more times a result is replicated the more likely it is the findings are accurate and not due to error
Validity
→ refers to how well a measure or construct actually measures or represents what it claims to
→ relates to accuracy
Types of Validity
- measurement validity (construct, content or criterion validity)
- internal validity
- external validity (population or ecological validity)
Measurement Validity
how well an operationalised variable corresponds to what it is supposed to measure
- include construct, content or criterion validity
Construct Validity
→ how well do your operationalised variables (independent and/or dependent) represent the abstract variables of interest
- are you measuring what you think/what you say you are measuring
Example: hunger in rats
- must consider the weight of the amount of food consumed, speed running towards food etc
Content Validity
→ degree to which the items or tasks on a multi-faceted measure accurately measure what its suppose to measure (the target domain)
- Many constructs are multi-faceted and sometimes multiple measures must be used to achieve adequate content validity
Example: extroversion measure with 40 different items
- 30 are similar (based on social behaviour, excitement measure, feelings in group settings)
- 10 are unrelated (favourite shops)
- We need all domains to accurately measure the construct of interest
- Critical in psych as many constructs require multi-domain measurements