Lecture 3 Flashcards
What are the four factors effecting reliability?
- People taking the test
- Item characteristics
- Test characteristics
- Method used to estimate reliability
Name the two item characteristics that effect internal-consistency reliability
- Correlation between items
- Number of items
What item characteristics must a test have to be reliable?
♣ A lot of items that show a small correlation; OR
♣ A few items that are strongly related
Why is validity dependent on reliability?
- CONCEPTUALLY: You must be measuring SOMETHING consistently to ensure that you are accurately measuring WHAT YOU WANT TO MEASURE
- MATHEMATICALLY: The maximum correlation between two variables is determined by their reliabilities
How can test items increase the test’s reliability?
Reliability increases as the number of items increase (assuming all additional items are of the same quality as existing items)
Even though adding more items increases a test’s reliability, why can’t test length be increased?
o Participant boredom
o Participant exhaustion
o Loss of motivation as test continues
o Rules used in practice: make the test as short as possible within acceptable reliability heuristics
♣ ~.90 to .95 for high-stakes individual applications
♣ ~.70 for more research purposes
♣ ~.60 if you have multiple indicators and can model statistical error
What are the practical problems with short tests?
o Item exposure (for high-stakes tests)
o For high-stakes applications (selection), you don’t want people to have seen the tests before
♣ Tests will lose their reliability if people have seen them before
o Ability to adequately sample the domain of interest
♣ Bandwidth (content coverage) vs. fidelity (reliability)
♣ Breadth of the construct measured
What is Adaptive Testing?
o The test is adapted to the person’s level of ability (qualitatively analysed)
o A test-taker’s previous responses determine which items they see next
o Used in major test batteries
o Not computerised
What is Computerised Adaptive Testing (CAT)?
o A computerised algorithm automatically selects further items according to a decision rule
o Can use either blocks of items or use single items
What are the advantages of Computerised Adaptive Testing (CAT)?
o Tests are a lot shorter but just as reliable
♣ Good match between person-level and item-level
• Increase variability maximum reliability
♣ Less problems with fatigue, motivation, boredom
♣ Economic bottom line
• Less testing sites booked for fewer hours, less test proctors, etc.
o Easier to maintain test security
♣ Different set of items given to different people
o Motivation factors:
♣ Very able people are not getting large number of too easy items
♣ Low ability people are not failing item after item
What are the disadvantages of Computerised Adaptive Testing (CAT)?
o Substantial preparation and outlay needed
♣ Development of VERY LARGE ITEM POOL
♣ Analyses of VERY LARGE ITEM POOL to determine the difficulty of each item
♣ Automated programming of algorithm/decision rule
o Requires computerised administration
When is Computerised Adaptive Testing (CAT) most appropriate to use?
Most appropriate for large-scale testing where test security is an issue
What is the difference between “typical” performance and “maximum” performance?
o Typical performance = what you generally would do
♣ Usually measured by rating scales (self-reports)
o Maximum performance = the best you can do
♣ Usually measured by ability scales
Describe what “maximum performance” consists of
♣ Aware of performance appraisal ♣ Willing to perform at his/her best ♣ Explicit standards ♣ Effort has to be exerted ♣ Role of motivation
What is a “vignette”?
Vignettes describe hypothetical people/scenarios and get ratings of them
o The average rating is taken off the person’s own self-rating