assessing ID Flashcards
classical test theory
True Score = Observed score +/- Error
aim is to understand and improve the reliability of tests
It can be positive or negative, i.e. higher/lower than true score.
Error has a tendency to cancel itself out in the long run. So if you test on several occasions or test lots of people, the positive and negative errors should cancel each other out.
criticisms of classical test theory
doesn’t consider which test items were right/wrong, e.g. four easy items correct or four difficult items correct result in the same score.
Item response theory was developed to deal with this problem. Uses P-levels: the proportion of individuals who get a particular question right.
Measure of ability: the most difficult item you still can get right, rather than simply the sum of correct answers.
types of tests
Measures of typical performance
(disposition, no right/wrong answers; self-report; AQ)
Personality inventories
Interest inventories (career guidance)
Measures of Drive, Motivation & Need (traits).
[2] Measures of maximum performance
(right/wrong answers; WAIS)
Tests of Attainment & Achievement [what have you learned?]
Tests of Ability and Aptitude. [what is your potential to learn? Speed tests]
sources of error
types :
[1] Systematic Error. (bias) e.g. test biased in favour of those fluent in English.
[2] Random Error (Unpredictable) Cannot be controlled for. e.g. developing test-taking techniques
sources (can be either):
1. Candidate-related error
Mood, tiredness, motivation.
2. Test-related error
Enough questions? Poor questions, too one-sided?
3. Procedural error
Way test is used. Poor Instructions
statistics
variance - reflects the variation in the different measurements of a variable
SD - reflects the variation in the different measurements of a variable
correlation/covariance - the relationship between two variables. We can use one to predict the other
estimating reliability
Internal consistency. e.g. chronbach’s alpha
Test-retest. scores at one time of testing are correlated with their scores on the same test at another time of testing.
[disadvantage: people get familiar with the test
Alternative forms. develop an alternative form and compute the correlation between form A and form B.
Split-half calculating people’s scores on two halves of the test, and computing the correlation
estimating validity
Face validity. measures what it should be me measuring (at face value)
Content validity.
Construct validity. What does the theory predict that the
participant should score on tests measuring
other constructs? [convergent and divergent validity]
Criterion-related validity.
norm and self-referencing
norm - To compute these we need a norm group (representative sample).
Involves percentiles, Z scores, and T scores
self - comparing an individual’s performance with their own performance on another linked scale. ipsative tests
applications of testing
educational - Assign grades; Identify students with Special Educational Needs; Adapt instruction to individual needs; Evaluating and improving teaching, and formulating educational goals.
occupTIONAL - insight into how well you work with others, how well you handle stress, and whether you can cope with the intellectual demands of a job. measures of both typical performance and maximum performance are used.
careers - It assesses logic, verbal, non-verbal and numerical reasoning, spatial awareness, flexibility, commitment, leadership, and initiative. Results act as a career guide, e.g. good verbal ability is associated with many administrative, managerial, and people-oriented occupations.
clin - Uses testing, observations and interviews. Requires professional judgement about problems at hand, and associated tests
ethics
informed consent; right to knowledge of results; confidentiality; record keeping; test construction/publication; automated scoring/interpretation; qualified administrators