CHAPTER 7:UTILITY Flashcards
Factors that affect a test’s utility:
- Psychometric Soundness
-validity sets ceiling on utility
-a test must be valid to be useful - Cost
-refers to disadvantages, losses or expenses in both economic and noneconomic
terms
-economic, financial or budget-related in nature must certainly be taken into
account - Benefits
-refers to profits, gains or advantages
family of techniques that entail a cost-benefit analysis designed to yield information
relevant to a decision about the usefulness and/or practical value of a tool of assessment
Utility Analysis
General Approaches in Utility Analysis:
- Expectancy Data
- Brodgen-Cronbach-Gleser Formula
- Decision Theory
Expectancy Data
provide an indication of likelihood that a testtaker
will score within some interval of scores on a criterion measure – an
interval may be categorized as “passing”, “acceptable” or “failing”
EXPECTANCY TABLE/CHART
EXPECTANCY DATA
Tables-estimate of the percentage of employees hired by a
particular test who will be successful to their jobs
Taylor-Russell Tables-estimate
EXPECTANCY DATA
estimate of the percentage of employees hired by a
particular test who will be successful to their jobs
TAYLOR-RUSSELL TABLES
EXPECTANCY DATA
used for obtaining the difference between the means
of the selected and unselected groups to derive an index of what the test is
adding to already established procedure
NAYLOR-SHINE
formula used to calculate the dollar amount of a utility gain resulting from the
use of a particular selection instrument under specified conditions
BRODGEN-CRONBACH-GLESER FORMULA
a body of methods used to quantitatively evaluate selection procedures,
diagnostic classifications, therapeutic interventions or other assessment or
intervention-related procedures in terms of how optimal they are (most typically
from a cost-benefit perspective)
DECISION THEORY
A correct classification
EX. a qualified driver is hired; an unqualified driver is not hired
HIT
An incorrect classification; a mistake
EX. a qualified driver is not hired; an unqualified driver is hired
MISS
the proportion of people that an assessment tool accurately identified
as possessing a particular variable
EX. the proportion of qualified drivers with a passing score who actually
gain permanent employee status; the proportion of unqualified drivers with a
failing score who did not gain permanent status
HIT RATE
proportion of people that an assessment tool inaccurately identified
as possessing a particular variable
EX. the proportion of drivers whom inaccurately predicted to be qualified;
the proportion of drivers whom inaccurately predicted to be unqualified
MISS RATE
falsely indicates that the testtaker possesses a particular variable
EX. a driver who is hired is not qualified
FALSE POSITIVE
falsely indicates that the testtaker does not possess a particular variable
EX. the assessment tool says to not hire but driver would have been rated as
qualified
FALSE NEGATIVE
Some Practical Considerations:
-the issue of how many people would actually accept the employment position
offer to them even if they were found to be qualified candidate
The Pool of Job Applicants
-many of the top performers on the test are people who are also being offered
positions by one or more other potential employers
The Pool of Job Applicants
the more complex the job, the more people differ in how well or poorly they do
that job (Hunter et. al.)
The Complexity of the Job
-Cut Score/Cutoff Score
-a (usually numerical) reference point derived as a result of a judgment and used to
divide a set of data into two or more classifications, with some action to be taken
or some inference to be made on the basis of these classifications
The Cut Score in Use
reference point – in a distribution of test scores used to divide
a set of data into two or more classifications – that is a set based on norm-related
considerations rather than on the relationship of test scores to a criterion
-aka norm-referenced cut score
-normative
-relative cut score-a
in a distribution of test scored used to divide a
set of data into two or more classifications – that is typically set with reference to a
judgment concerning a minimum level of proficiency required to be included in a
particular classification
-aka absolute cut score
-criterion
-fixed cut score
the use of two or more cut scores with reference to one predictor for the purpose
of categorizing testtakers
Multiple Cut Scores
-the achievement of a particular cut score on one test is necessary in order to
advance to the next stage of evaluation in the selection process
Multiple-stage or Multi Hurdle
-a model of applicant selection based on the assumption that high scores on one
attribute can balance out low scores on another attribute
-Compensatory Model of Selection
Methods of Setting Cut Scores:
-devised by William Angoff
-a way to set fixed cut scores that entails averaging the judgments of experts
-must have high inter-rater reliability
Angoff Method
-a system of collecting data on a predictor of interest from groups known to
possess (and not to possess) a trait, attribute or ability of interest
-a cut score is set on the test that best discriminates the high performance from low
performers
Know Groups Method/Method of Contrasting Groups
-in order to “pass” the test, the testtaker must answer items that are considered that has some minimum level of difficulty, which is determined by the experts and serves as the cut score
IRT-Based Methods
-a technique for identifying cut scores based on the number of positions to be
filled
Method of Predictive Yield
-a family of statistical techniques used to shed light on the relationship between
certain variables and two or more naturally occurring groups
Discriminant Analysis
an estimate of the benefit (monetary/otherwise) of using a particular
test or selection method
UTILITY GAIN
CHAPTER 8
TEST DEVELOPMENT
Step 1: Test Conceptualization
-Conception of idea by the test developer
-norm-referenced-conceptualization of items based on testtakers norm
-criterion-referenced-conceptualization is on the construct that is need to master
Step 1: Test Conceptualization
-necessary for research reason; but not required for teacher-made test
-done to evaluate the items, which is really needed to put up in the actual test
-determined in the pilot testing – the best way to measure the construct
-note: there’s an instance that a test which is already good for construction, might need
further pilot research
Pilot Work-prototype of the test
Step 2: Test Construction
-the process of setting rules for assigning numbers in measurement
Scaling
scaling methods:
-a type of summative rating scale
-five alternative responses (sometimes seven)
-ordinal in nature
a. Likert Scales
SCALING MRTHOD:
-scaling method whereby one of a pair of stimuli (such as photos) is selected according to
a rule (such as ―select the one that is more appealing‖)
b. Paired Comparison
-named for its developer; a scale wherein items range sequentially from weaker to
stronger expressions of the attitude or belief being measured
c. Guttman Scale/Scalogram Analysis
-presumed to be interval in nature
d. Thurstone’s Equal Appearing Intervals Method
a. comparative scaling (best to worst)
b. categorical scaling (section1, section 2, section 3)
-scaling systems:
-When devising a standardized test using a multiple-choice format, it is usually advisable
that the first draft contain approximately twice the number of items that the final version
of the test will contain.
Writing Items
the reservoir or well from which items will or will not be drawn for the final
version of the test; the collection of items to be further evaluated for possible selection
for use in an item bank
ITEM POOL
-the form, plan, structure, arrangement and layout of individual test items
ITEM FORMAT
-a form of test item requiring testtakers to select a response
- Selected-Response Format
-has 3 elements: stem, correct alternative/option, distractors/foils
-criteria of good multiple-choice:
-has one correct alternative
-has grammatically parallel alternatives
-has alternatives of similar length
-has alternatives that fit grammatically with the stem
-includes as much of the item as possible in the stem to avoid unnecessary
repetition
-avoids ridiculous distractors
-not excessively long
A. Multiple-Choice Format
-a testtaker is presented with two columns: premises and responses, and must
determine which response is best associated with which premise
-testtaker could get perfect score even if he did not actually know all the
answers
-to minimize the possibility, provide more options or state in the directions that
each response may be a correct answer once, more that once or not at all
B. Matching-item
-a multiple-choice item that contains only two possible responses
-criteria of a good binary-choice:
-contains a single idea
-not excessively long
-not subject to debate
-the correct response is definitely be one of the choices
C. Binary-Choice Items / True or False
-a form of test item requiring the testtaker to construct or create a response
- Constructed-Response Items
-requires the examinee to provide a word or phrase that completes a sentence
A. Completion or Short Answer (Fill in the Blacks)
-is useful when the test developer wants the examinee to demonstrate a depth of knowledge about a single topic
-allows for the creative integration and expression of the material in the
testtaker’s own words
-the main problem in essay is the subjectivity in scoring
B. Essay
Writing Items for Computer Administration:
a collection of questions to be used in the construction of tests computer test
administration
Item Bank
-in computerized adaptive testing, the individualized presentation of test items
drawn from an item bank based on the testtaker’s previous responses
Item Branching
Computer Adaptive Testing reduces the:
a phenomenon arising from the diminished utility of a tool of
assessment in distinguishing testtakers at the low end of the ability, trait or other
attribute being measured (very low scored due to very hard questions)
-Floor Effect
the diminished utility of an assessment tool for distinguishing
testtakers at the high end of the ability, trait, or other attribute being measured
(very high scored due to very easy questions)
CEILING EFFECT
(Scoring Items)
a method of scoring whereby points or scores accumulated on
individual items or subtests are tallied and then, the higher the total sum, the
higher the individual is presumed to be on the ability, trait, or other characteristic
being measured
(Example: High IQ Score > more intelligent)
CUMULATIVE MODEL
(SCORING METHOD)
a method of evaluation in which test responses earn
credit toward placement in a particular class or category with other testtakers.
Sometimes testtakers must meet a set number of responses corresponding to a particular criterion in order to be placed in a specific category or class
(Examples: GPA of 1.50 and above will be placed on Star Section; GPA of 2 and
below will be placed on Lower Section)
Class or Category Scoring
(SCORING METHOD)
an approach to test scoring and interpretation wherein the testtaker’s responses and the presumed strength of a measured trait are interpreted relative to the measured strength of other traits for that testtaker / forced to answer
(Example: High Score in Extraversion; Low in Agreeableness)
Ipsative scoring
-The test should be tried out on people who are similar in critical respects to the people
for whom the test was designed
-The test tryout should be executed under conditions as identical as possible to the
conditions under which the standardized test will be administered
Step 3: Test Tryout
What is a good item?
-reliable and valid
-helps discriminate testtakers
-if:
high scorers – incorrect = bad item
low scorers – correct = bad item
high scorers – correct = correct item
low scorers – incorrect = correct item
-Statistical procedures used to analyze items
Step 4: Item Analysis
-In achievement or ability testing and other contexts in which responses are keyed
correct, a statistic indicating how many testtakers responded correctly to an item
-In contexts where the nature of the test is such that responses are not keyed correct, this
same statistic may be referred to as an item-endorsement index
Item Difficulty Index
Formula:
# of testtakers who answered correctly
_______________________________
Total # of testtakers
0.0 = no one got the correct answer
1.0 = everyone is correct
Level of Difficulty:
0.0 to 0.20 – very difficult
0.21 to 0.40 – difficult
0.41 to 0.60 – average
0.61 to 0.80 – easy
0.81 to 1.00 – very easy
Standards:
0.50 – optimal average item difficulty (whole test)
0.30 to 0.80 – average item difficulty on individual items
0.75 – true or false
0.625 – multiple choice (4 choices)
-provides an indication of internal consistency of a test
Item RELIABILITY Index
-provides an indication of the degree to which a test is measuring what it purports to
measure
-higher value; greater test’s criterion-related validity
Item VALIDITY Index
-indicate how adequately an item separates or discriminates between high scorers and low
scorers on an entire test
-(+) value = high scorers answer item correctly
-(-) value = low scorers answer item correctly then high scorers
Item Discrimination Index
-graphic representation of item difficulty and item discrimination
-the steeper the slope, the greater item discrimination
-easy item – lean on left
-difficult item – lean on right
Item-Characteristic Index
-rely primarily on verbal
-non-statistical procedures designed to explore how an individual test items work
Qualitative Item Analysis
method of qualitative item analysis requiring
examinees to verbalize their thoughts as they take a test; useful in understanding how
individual items function in a test and how testtakers interpret or misinterpret the
meaning of individual items
-Think Aloud Test Administration
a study of test items, usually during test development, in which items
are examined for fairness to all prospective testtakers and for the presence of offensive
language, stereotypes or situations
SENSITIVITY REVIEW
-(as a stage in new test development): polishing and finishing touches
-(in the life cycle of an existing test): no hard-and-test rule exist when to revise a test but
it should be revised when significant changes in the domain represented, or new
conditions of test use and interpretations, make the test inappropriate for its intended use
Step 5: Test Revision
a revalidation on a sample of testtakers other than the testtakers on whom test performance was originally found to be a valid predictor of some criterion
Cross Validation
-the test validation process conducted on two or more tests using the same
sample of testtakers; when used in conjunction with the creation of norms or the revision
of existing norms; this process may also be referred to as co-norming
Co-Validation
CHAPTER 9
INTELLIGENCE AND ITS MEASUREMENT
-a multifaceted capacity that manifests itself in difference ways across the life span
Intelligence defined: Views of the Lay Public
INTELLIGENCE
-In general, the researchers found a surprising degree of similarity between the experts’
and laypeople’s conception of intelligence
-However, in terms of academic intelligence
-Experts put emphasis on motivation, while laypeople stressed the importance of social
aspects
Research by STENBERG (1981)
-There’s a different conceptions of intelligence as a function of developmental stage
Research by SIEGLER and RICHARDS (1980)
-Suggested that children also have notions about intelligence as early as first grade
Research by YUSSEN and KANE (1980)
Intelligence defined: Views of Scholars and Test Professionals
-first person to published on the heritability of intelligence, thus framing the contemporary nature-nurture debate
- FRANCIS GALTON
-he believed that the most intelligent persons were those equipped with the best sensory
abilities
- FRANCIS GALTON
-he believed that the most intelligent persons were those equipped with the best sensory
abilities
INTERACTIONISM
(Heredity + Environment = Intelligence)
FRANCIS GALTON
-components of intelligence: reasoning, judgment, memory and abstraction
-more complex measure of intelligence
ALFRED BINET
-intelligence as “aggressive” or “global” capacity
-considered other factors (traits and personality) in assessing intelligence
-at first, he proposed two qualitatively abilities: Verbal and Performance
-then, he added other factors: Verbal Comprehension, Working Memory, Perceptual
Organization, Processing Speed
DAVID WECHSLER
-intelligence is evolving biological adaptations to the outside world
-focused on the development of cognition in children
JEAN PIAGET
-an organized action or mental structure that when applied to the
world, leads to knowing and understanding
-schema (or schemata)
-the basic mental operations:
- actively organizing new information so that it fits in what already perceived and thought
-changing what is already perceived or thought so that it fits
with the new information
-causes the individual to discover new information perceptions and
communication skills
-Assimilation
-Accommodation
-Disequilibrium
(FACTOR ANALYSIS THEORIES)
-Theory of General Intelligence / Two-Factor Theory of Intelligence
-(g) - general intellectual ability
-(s) - specific components
-(e) - error components
-The greater the magnitude of g in a test of intelligence, the better the test was thought to predict overall intelligence
-g factor is based on some type of general electrochemical mental energy available to the
brain for problem solving
-Abstract Reasoning were thought to be the best measures of g in formal test
CHARLES SPEARMAN
an intermediate class of factors common to a group of activities but not
at all
Ex: Linguistic, Mechanical, Arithmetical
GROUP FACTORS
-Intelligence is a systematic collection of abilities or functions for the processing of
information of different kinds in various ways
-de-emphasized (g)
-research on US Army Air Corps during the War, and he was able to identify 25
important mental ability factors
-Structure of Intellect Model (SI Model)
JOY PAUL GUILFORD
-intelligence is considered as mental trait. It is the capacity for abstraction, which is
inhibitory process
-seven primary abilities
-word fluency
-verbal comprehension
-spatial visualization
-number facility
-associative memory
-reasoning
-perceptual speed
LOUIS LEON THURNSTONE
-intelligence is the ability to solve problems or to create products, that are valued within
one or more cultural settings
-theory of multiple intelligence:
-logical-mathematical
-bodily-kinesthetic
-linguistic
-musical
-spatial
-interpersonal
-intrapersonal
HOWARD GARDNER
-two major types of cognitive abilities:
-Crystallized Intelligence (Gc)
-acquired skills and knowledge that are dependent on exposure to a particular
culture as well as on formal and informal evaluation
(Example: Vocabulary)
-Fluid Intelligence (Gf)
-nonverbal, relatively culture-free and independent of specific instruction
(Example: Encoding of Short Term Memory)
RAYMOND CATTELL
-Addition of several factors to his mentor’s, Raymond Cattel, work
-Gv - Visual Processing
-Ga - Auditory Processing
-Gq - Quantitative Processing
-Gs - Speed Processing
-Grw - Reading and Writing
-Gsm - Short Term Memory
-Glr - Long Term Storage and Retrieval
JOHN HORN
-Three Stratum Model of Human Cognitive Abilities
-Stratum III -the general level/general intellectual ability
-Stratum II -the broad level; 8 factors
-Stratum I -the specific level; more specific factors
JOHN CARROLL
-Cattell-Horn-Carroll Models (CHC)
-10 Broad Stratum
-Over 70 narrow stratum
MCGREW AND FLANAGAN
(INFORMATION-PROCESSING VIEW)
-Information-Processing Approach -focuses on the mechanisms by which information is
processed -”how it is processes and what is being processed”
-two basic types:
-simultaneous (parallel)
-information is integrated at all time
-successive (sequential)
-each bit of information is individually processed in sequence
-Kaufman Assessment Battery for Children 2nd Edition rely heavily on this concept
- ALEKSANDR LURIA
-Triarchic Theory of Intelligence
-Metacomponents -planning, monitoring, evaluating
-Performance Components -performing the instructions of metacomponents
-Knowledge Acquisition -learning something new
ROBERT STERNBERG
-Planning -strategy development for problem solving
-Attention/Arousal -receptivity to information
-Simultaneous and Successive -the type of information processing employed
-PASS Model
(Theory in Intelligence Test Development and Interpretation)
- Hereditary Genius entitled “Classification of Men According to
Their Natural Gifts”
-discussed sensory and other differences between people, which he believed were
inherited’
FRANCIS GALTON
“universal unity of the intellective function,” with g as its centerpiece
CHARLES SPEARMAN
wrote extensively on what intelligence is, and he usually emphasized that it is multifaceted and consists not only of cognitive abilities but also of factors related to personality.
DAVID WECHSLER
– primary factors of mental ability
-intelligence can be conceived in terms of three clusters of ability: social
intelligence (dealing with people), concrete intelligence (dealing with objects), and
abstract intelligence (dealing with verbal and mathematical symbols)
-so incorporated a general mental ability factor (g) into the theory, defining it as
the total number of modifiable neural connections or “bonds” available in the
brain
Louis Leon Thurstone
(Intelligence: Some Issues)
Intelligence: Some Issues
“NATURE VS. NATURE”
-all living organisms are preformed at birth
-all of the organism’s structures, including intelligence, are preformed at birth and
therefore cannot be improved
-it is like a cocoon turned into butterfly
Preformationism
-one’s abilities are pre-determined by genetic inheritance and that no amount of
learning or other intervention can enhance what has been genetically encoded to
unfold time
-Arnold Gesell
-”training does not transcend maturation”
-mental development as a progressive morphogenesis of pattern of behavior
-behavior patterns are predetermined by “innate process growth”
PREDETERMINISM
-believed that genius was hereditary
FRANCIS GALTON
-argued that degeneracy (being immoral) was also inherited
-Richard Dugdale
-role of hereditary in feeblemindedness
-feeblemindedness is the product of recessive gene
Henry Goddard
-the father of the American version of Binet’s test
-based on his testing he concluded that Mexican and Native American are inferior
-Lewis Terman