week11 Flashcards
What has influenced test development up to now?
Content developments
Theoretical developments
-Intelligence
-Personality
Technical and methodological
Statistics
E.g., Factor analysis
Computers & the internet
Contextual needs
- Political (e.g., impact of World Wars)
- Funding/policy (e.g., educational testing)
Future of Testing
Likely same influences will impact on testing into the future:
Content developments
Technical and methodological developments
Contextual changes
Content Development
Construct development
A construct is a hypothetical entity with theoretical links to other hypothesised variables, proposed to relate to a consistent set of observable behaviours, thoughts or feelings that is the target of a psychological test.
Theoretical advances, such as new constructs emerging in the literature, might give an idea on future tests and procedures likely to be developed.
Emerging Constructs
3:11-7:13 https://www.youtube.com/watch?v=9xTz3QjcloI
Expansion of constructs of intelligence
Gardner’s theory of multiple intelligences
Drive development of broader measures
Content Development
Big Five shaped development of a number of assessment measures
New concepts/increased attention driving new measure development, e.g.,
–Emotional intelligence
–Refers to a person’s capacity to monitor/manage emotions, understand the emotions of others, and use these insights to function better interpersonally
—Controversial: where to locate this in existing theory? Amalgamation of existing personality traits?
Integrity: dependability, theft proneness, counterproductive work behaviour.
—Specific type of personality test or a direct measure to test a job applicants honesty, trustworthiness or integrity
Content Development
Neuroscience & brain function
- Potential psychological interpretations for imaging?
- Line between physiological and psychological assessment?
Technical and Methodological Developments
Increasing access to computers and internet over time
-Computer-assisted psychological assessment (CAPA)
Smart testing
- Computerised and multidimensional adaptive testing
- Item-generation technology
- Time-parameterised testing
- Latent factor-centred design
- Internet testing
Serious gaming
Potential for virtual reality, artificial intelligence in assessment
Computer Applications
1950s: computers first available for testing and assessment
CAT conceived
New developments in test theory including item response theory
Costs/skills prohibitive for mainstream use
Computer Applications
1980s: widespread proliferation of affordable home computers
Test developer access to affordable computing power
Development of computerised testing began
1990s: widespread growth of the internet
Possibility of internet testing
Testing as big business
Rapid proliferation of tests/testing
Are computer and pen and paper forms equivalent though?
Does computer presentation fundamentally change the construct being measured?
Generally the answer is no
Cross-mode correlations of 0.97 (e.g., Mead & Drasgow, 1993 meta-analysis)
Not much difference between ticking a box on a questionnaire with a pencil or mouse
Psychological decision-making processes remain the same
But….
speeded tests
psychomotor effects
Speeded tests are an exception (e.g., Greaud & Green, 1986)
Characterised by very simple tasks performed repetitively, as quickly as possible, within a short time limit (e.g., coding on WISC/WAIS)
Psychomotor effects on speeded tests, variations in response modality (i.e. pen & pencil vs. computer) do affect results
- -Cross-mode correlation of 0.72 (e.g., Mead & Drasgow, 1993 meta-analysis)
- -Using a pencil is easier than using a mouse, thus mode of response greatly affects measurement
Computer-assisted testing: WISC-V as an example
https://www.youtube.com/watch?v=tp5B86ajbmw
Multidimensional Adaptive Testing (MAT)
MAT as an extension of Computerised adaptive testing (CAT) covered in educational testing lecture
-Multivariate generalisation
Revision: CAT is where a computer continuously monitors test-taker’s performance and selects next item to administer to get the most information
- Item correct- harder item
- Item incorrect- easier item
- Adapts to your location on underlying trait- to around where you would get half right and half wrong
Multidimensional Adaptive Testing (MAT)
MAT takes adaptive testing to the next level by applying this same idea to a battery of tests rather than a single test
–Capitalises on idea that many constructs measured by a test are correlated
Performance on each item then informs items used for every subtest in a battery
Adapts simultaneously across subtests
Key advantage: reduces test time without sacrificing accuracy of measurement across a whole battery
Limitations of MAT
Like CAT, amount of effort to develop a sufficiently large item bank to draw from
Requires 100s of items with item parameters estimated
Requires data from large samples of examinees with extensive testing during development, even more so than in CAT
Potential for “chopping and changing” between item types as system selects any subtest in the battery
- -May be confusing for test-takers
- -Need to remember instructions across subtests
- —-Memory requirements may be unrealistic
Item-Generative Testing
Possible solution for need for large item banks (MAT and CAT)
New items generated automatically by a computer based on an underlying rule or algorithm
–Main source of difficulty for subtest by rule/template computer can generate infinite number of actual items of desired difficulty or by randomly initialising key variables/applying a rule
-Potential future assessments based on cognitive models of test performance to drive item-generative testing
Time Parameterisation
Speed vs. accuracy?
- May sacrifice one for the other, creates challenges in scoring/interpretation
- Can’t tell from final score which strategy taken
BUT, Computer administered tests allow capture of response time
Challenge is how to use this?
- Analyse separately?
- Combine to investigate accuracy/time trade-off (efficiency)?
- Treat time as a difficulty dimension?
- Set a time limit or deadline for each item?
Internet Testing
Revolutionised testing
Larger impact on distribution
than development of tests
Questions can be quickly circulated to psychologists and other uses
Internet versions of tests easily kept up to date and disseminated upon development
Can modify scoring and way questions presented easily
Information easily returned to test developer
- –Potential for dynamic norming
- –Potential in future for multidimensional, adaptive, item-generative, time-parameterised, latent factor-centred, dynamically normed test!
Internet Testing: Risks & Limitations
“Digital divide” (Batram, 2000)
Some people have better access to the internet than others; best access tends to be most privileged
Strong tradition in testing of trying to avoid discrimination
Narrowing gap in recent years as computers and internet are becoming cheaper and more widespread
Potential to bridge service gaps in rural/remote areas
–More limited access to professionals/getting to a test centre
Risks and Limitations
Security of Information
- Security of potentially highly sensitive information for the test-taker
- -Security of the test itself
- —Tests restricted in access to maintain integrity of test
- —Potential for rapid dissemination via the internet
- —–Disable printing and screen capture
- ———–Can’t stop someone photographing with digital camera!
- Bandwidth limitations
- —Can impact on timing due to lag
- —–Serious challenges for CAT or MAT
- –To work, answers need to go back to server for scoring/adaptation
- ———If bandwidth challenges seriously slow down test and impact on test experience
- —-OR, download whole test locally- large downloads & may exacerbate test security challenges
Risks and Limitations
Proliferation of non-evidence-based assessments on the internet
Pop-psych and para-psychological
Major problem for field
Testing vs. Assessment
Internet suited to testing, but not assessment
Risk of being used (inappropriately) as a replacement for psychological assessment
Very open to misuse and misinterpretation
Industrial and Organisational Testing Online
Rise of online recruiters and job markets
Potential for automatic head hunting with “web bots” trawling web for CVs (e.g., LinkedIn)
Temptation for delivery of psychological tests/assessment direct to public without a psychologist
- “Unsupervised mode”
- Raises questions about assumptions in psychology of requirements for assessment
Supervised Testing in the Digital Age
Functions of supervision of assessment
- Authenticating the test-taker
- Establishing rapport
- Ensuring test is administered according to manual
- Preventing cheating
- Ensuring security of the test itself
Levels of Supervision
Open (“unsupervised mode”)
E.g., tests published online, in magazines, or books
Many personal development measures
If tests incurred significant development costs unlikely to be open
Only suitable for low-stakes testing
Controlled (e.g., password to access)
Suitable for first step in recruitment process
Recommended to follow-up with verified testing
Supervised mode (e.g., presence of proctor in non-secure environment) E.g., NAPLAN in 2018
Managed mode (formal examination conditions with test kept secure)
May include locally supervised or remote (e.g., using webcam technology, keystroke monitoring, and timing)
Raises additional complexities!