week11 Flashcards
What has influenced test development up to now?
Content developments
Theoretical developments
-Intelligence
-Personality
Technical and methodological
Statistics
E.g., Factor analysis
Computers & the internet
Contextual needs
- Political (e.g., impact of World Wars)
- Funding/policy (e.g., educational testing)
Future of Testing
Likely same influences will impact on testing into the future:
Content developments
Technical and methodological developments
Contextual changes
Content Development
Construct development
A construct is a hypothetical entity with theoretical links to other hypothesised variables, proposed to relate to a consistent set of observable behaviours, thoughts or feelings that is the target of a psychological test.
Theoretical advances, such as new constructs emerging in the literature, might give an idea on future tests and procedures likely to be developed.
Emerging Constructs
3:11-7:13 https://www.youtube.com/watch?v=9xTz3QjcloI
Expansion of constructs of intelligence
Gardner’s theory of multiple intelligences
Drive development of broader measures
Content Development
Big Five shaped development of a number of assessment measures
New concepts/increased attention driving new measure development, e.g.,
–Emotional intelligence
–Refers to a person’s capacity to monitor/manage emotions, understand the emotions of others, and use these insights to function better interpersonally
—Controversial: where to locate this in existing theory? Amalgamation of existing personality traits?
Integrity: dependability, theft proneness, counterproductive work behaviour.
—Specific type of personality test or a direct measure to test a job applicants honesty, trustworthiness or integrity
Content Development
Neuroscience & brain function
- Potential psychological interpretations for imaging?
- Line between physiological and psychological assessment?
Technical and Methodological Developments
Increasing access to computers and internet over time
-Computer-assisted psychological assessment (CAPA)
Smart testing
- Computerised and multidimensional adaptive testing
- Item-generation technology
- Time-parameterised testing
- Latent factor-centred design
- Internet testing
Serious gaming
Potential for virtual reality, artificial intelligence in assessment
Computer Applications
1950s: computers first available for testing and assessment
CAT conceived
New developments in test theory including item response theory
Costs/skills prohibitive for mainstream use
Computer Applications
1980s: widespread proliferation of affordable home computers
Test developer access to affordable computing power
Development of computerised testing began
1990s: widespread growth of the internet
Possibility of internet testing
Testing as big business
Rapid proliferation of tests/testing
Are computer and pen and paper forms equivalent though?
Does computer presentation fundamentally change the construct being measured?
Generally the answer is no
Cross-mode correlations of 0.97 (e.g., Mead & Drasgow, 1993 meta-analysis)
Not much difference between ticking a box on a questionnaire with a pencil or mouse
Psychological decision-making processes remain the same
But….
speeded tests
psychomotor effects
Speeded tests are an exception (e.g., Greaud & Green, 1986)
Characterised by very simple tasks performed repetitively, as quickly as possible, within a short time limit (e.g., coding on WISC/WAIS)
Psychomotor effects on speeded tests, variations in response modality (i.e. pen & pencil vs. computer) do affect results
- -Cross-mode correlation of 0.72 (e.g., Mead & Drasgow, 1993 meta-analysis)
- -Using a pencil is easier than using a mouse, thus mode of response greatly affects measurement
Computer-assisted testing: WISC-V as an example
https://www.youtube.com/watch?v=tp5B86ajbmw
Multidimensional Adaptive Testing (MAT)
MAT as an extension of Computerised adaptive testing (CAT) covered in educational testing lecture
-Multivariate generalisation
Revision: CAT is where a computer continuously monitors test-taker’s performance and selects next item to administer to get the most information
- Item correct- harder item
- Item incorrect- easier item
- Adapts to your location on underlying trait- to around where you would get half right and half wrong
Multidimensional Adaptive Testing (MAT)
MAT takes adaptive testing to the next level by applying this same idea to a battery of tests rather than a single test
–Capitalises on idea that many constructs measured by a test are correlated
Performance on each item then informs items used for every subtest in a battery
Adapts simultaneously across subtests
Key advantage: reduces test time without sacrificing accuracy of measurement across a whole battery
Limitations of MAT
Like CAT, amount of effort to develop a sufficiently large item bank to draw from
Requires 100s of items with item parameters estimated
Requires data from large samples of examinees with extensive testing during development, even more so than in CAT
Potential for “chopping and changing” between item types as system selects any subtest in the battery
- -May be confusing for test-takers
- -Need to remember instructions across subtests
- —-Memory requirements may be unrealistic
Item-Generative Testing
Possible solution for need for large item banks (MAT and CAT)
New items generated automatically by a computer based on an underlying rule or algorithm
–Main source of difficulty for subtest by rule/template computer can generate infinite number of actual items of desired difficulty or by randomly initialising key variables/applying a rule
-Potential future assessments based on cognitive models of test performance to drive item-generative testing
Time Parameterisation
Speed vs. accuracy?
- May sacrifice one for the other, creates challenges in scoring/interpretation
- Can’t tell from final score which strategy taken
BUT, Computer administered tests allow capture of response time
Challenge is how to use this?
- Analyse separately?
- Combine to investigate accuracy/time trade-off (efficiency)?
- Treat time as a difficulty dimension?
- Set a time limit or deadline for each item?
Internet Testing
Revolutionised testing
Larger impact on distribution
than development of tests
Questions can be quickly circulated to psychologists and other uses
Internet versions of tests easily kept up to date and disseminated upon development
Can modify scoring and way questions presented easily
Information easily returned to test developer
- –Potential for dynamic norming
- –Potential in future for multidimensional, adaptive, item-generative, time-parameterised, latent factor-centred, dynamically normed test!
Internet Testing: Risks & Limitations
“Digital divide” (Batram, 2000)
Some people have better access to the internet than others; best access tends to be most privileged
Strong tradition in testing of trying to avoid discrimination
Narrowing gap in recent years as computers and internet are becoming cheaper and more widespread
Potential to bridge service gaps in rural/remote areas
–More limited access to professionals/getting to a test centre
Risks and Limitations
Security of Information
- Security of potentially highly sensitive information for the test-taker
- -Security of the test itself
- —Tests restricted in access to maintain integrity of test
- —Potential for rapid dissemination via the internet
- —–Disable printing and screen capture
- ———–Can’t stop someone photographing with digital camera!
- Bandwidth limitations
- —Can impact on timing due to lag
- —–Serious challenges for CAT or MAT
- –To work, answers need to go back to server for scoring/adaptation
- ———If bandwidth challenges seriously slow down test and impact on test experience
- —-OR, download whole test locally- large downloads & may exacerbate test security challenges
Risks and Limitations
Proliferation of non-evidence-based assessments on the internet
Pop-psych and para-psychological
Major problem for field
Testing vs. Assessment
Internet suited to testing, but not assessment
Risk of being used (inappropriately) as a replacement for psychological assessment
Very open to misuse and misinterpretation
Industrial and Organisational Testing Online
Rise of online recruiters and job markets
Potential for automatic head hunting with “web bots” trawling web for CVs (e.g., LinkedIn)
Temptation for delivery of psychological tests/assessment direct to public without a psychologist
- “Unsupervised mode”
- Raises questions about assumptions in psychology of requirements for assessment
Supervised Testing in the Digital Age
Functions of supervision of assessment
- Authenticating the test-taker
- Establishing rapport
- Ensuring test is administered according to manual
- Preventing cheating
- Ensuring security of the test itself
Levels of Supervision
Open (“unsupervised mode”)
E.g., tests published online, in magazines, or books
Many personal development measures
If tests incurred significant development costs unlikely to be open
Only suitable for low-stakes testing
Controlled (e.g., password to access)
Suitable for first step in recruitment process
Recommended to follow-up with verified testing
Supervised mode (e.g., presence of proctor in non-secure environment) E.g., NAPLAN in 2018
Managed mode (formal examination conditions with test kept secure)
May include locally supervised or remote (e.g., using webcam technology, keystroke monitoring, and timing)
Raises additional complexities!
Considerations for Supervised Testing in the Digital Age
When does supervision matter?
Tests of typical performance (e.g., personality, interest inventories) tend to not be adversely affected by absence of formal supervision
Tests of maximal performance (e.g., aptitude, achievement tests) answers impacted on by presence/absence of a supervisor
Potential to look up answers, phone a friend, etc
Tends to inflate test scores
Technology
What technology could be used in future assessment?
Serious games Eye-tracking Mobile phones/smart phones Wearable devices Some authors have suggested potential for virtual reality, artificial intelligence, & holograms!
“Serious Games”
Game developed other than for primary purpose of entertainment (Charsky, 2010)
May provide an economical and accessible alternative where game play is a form of assessment
Benefits include the ability to design personalised games, promote health-related behavioural change, and educate participants
E.g., “Whack-a-mole” for cognitive assessment of older adults
Mobile Phones
Mobile phones include: Microphones that can record Videos/photographs Bluetooth GPS Accelerometers E.g., GCC Applications (Apps)
Mobile Phones
Smartphones re-purposed as tools of assessment contain safeguards to protect the privacy of the subject of the assessment.
May be used for local, remote, & ecological momentary assessment (real time)
Well accepted by clients (e.g., psychiatric patients) and user-friendly
Wearable Devices
Eye tracking glasses
Recording devices (steps, vocalisations, heart-rate etc)
E.g., Language Environment Analysis (LENA) tracks child/adult vocalisations and records
May be used to assess language impairments, monitor treatment progress, and effectiveness of interventions
Potential of technology significantly changing what we see as a “test” in future
Contextual Changes
Broader social environment shapes what assessments are developed
Push for simpler/shorter measures that can be developed quickly, in contrast to technological wizardry!
Continually rising demands of general public
Increasing demands for accountability and transparency
Meet demands through ever more vigilance in terms of ethics and professionalism, and increasing scientific research into validity of tests.
Contextual Changes
Managed care in clinical domain
Reluctance to use psychological assessment!
Subject to funding: not funded/limited funding (e.g., Medicare, NDIS)
E.g., NDIS funding based on functional impairment but no single measure appropriate across all ages and disabilities
Cost cutting/funding concerns- need to advocate for value/need
Important role of ethics in face of pressures in future assessment!