Lecture 4.2 Reliability Flashcards
Reliability
• The consistency with which a test measures what it purports to measure in any given set of circumstances
True
True or False
A reliable test will result in the same score every time it is used to measure the same thing under the same conditions
Reliability coefficient
An index of reliability that indicates the ratio between the true score variance on a test and the total variance (SD2)
> .90
Reliability coefficient of _______ is excellent for research purposes, appropriate for individual assessment purposes
> .80
Reliability coefficient of _______s good for research purposes, marginal for individual assessment
Reliability coefficient
- higher scores = higher reliability
- > .6 is marginal for research purposes
- > .70 is adequate for research purposes
Classic Test Theory
assumes that each person has an innate true score. It can be summed up with an equation:
X = T + E,
Real score is true score plus error
more reliable
higher proportion of true variance =
less reliable
higher proportion of error variance
increase or decrease
error variance may______________ or _________________ a test score by varying amounts –leading to lower reliability
Systematic error and unsystematic
Two types of testing error
Systematic error
Testing error that doesn’t affect reliability. Consistent error, predictable (when aware) – leaking tyre
Unsystematic error
Testing error that effects reliability. Inconsistent, unpredictable – electrical problem
Test construction
Sources of Error Variance T_______ C_______
The content covered by test items, the way questions are asked, and the response format all add to the error variance of a test
Test administration
Sources of Error Variance T_______ A_______
• Test environment (including test materials), test-taker variables (e.g., alertness, wellbeing, mistakes) & administrator-related variables (e.g., presence or absence, demeanour, departure from procedure, unconscious cues, etc.)
Test scoring & interpretation
Sources of Error Variance T_______ s _______ a ________
Human error - data entry, transcription, coding, calculation, timing, etc.
Level of objectivity/subjectivity
Human fallibility
Sources of Error Variance h _______ f _________.
• Forgetting or misremembering
• Failing to notice or not being aware
• Not understanding or following instructions
• Under- and over-reporting
• Differences of opinion
• Lying or misleading
Time and practice effects
Sources of Error Variance ti_________ and pr______eff________.
Domain Sampling Model
This model assumes that the items that have been selected for any one test are just a sample of items from an infinite domain of potential items. Error that occurs in the development of a test.
Domain Sampling Model
• Seeks to determine how precisely the test score assesses the domain from which the test draws a sample
True score
The score you would get if you answered all the items that could be conceivable.
Standard Error of Measurement (SEM)
• Measures the precision of an observed score & provides an estimate of the amount of error inherent in an observed score or measurement
Standard Error of Difference (SED)
Can be used to compare:
• an individual’s scores on two different tests
• two different people’s scores on the same test
• two different people’s scores on two different tests
Test-Retest Reliability
- Calculated by correlating scores from the same people on two different administrations of the same test
- Used for measuring characteristics that are thought to be stable (e.g. personality traits or intelligence)
amount of time between administrations
Any interventions, treatment or trauma, taking place between test administrations;
Test-retest reliability will be affected by
Parallel & Alternate Forms Reliability
Different versions of a test, matched for content and difficulty
Split-Half Reliability
Scores from one half of a test are correlated with the other half of the test, using equivalent halves
• Random, odds & evens, content & difficulty
Inter-Rater Reliability
The degree of agreement between two or more scorers. Reduced by appropriate training.
Test-retest
correlate scores from 2 administrations of the same test
Parallel forms
correlate scores from 2 versions of the same test
Split-half
correlate scores from 2 equivalent halves of the same test