Chapter 7 - 8 Flashcards

1
Q

usefulness or practical value of testing to improve efficiency

A

Utility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

used to refer to the usefulness or practical value of a training program or
intervention

A

Utility

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Factors that affect a test’s utility

A
  1. Psychometric Soundness
  2. Cost
  3. Benefits
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Gives us the practical value of both the scores (reliability
and validity)

A

Psychometric Soundness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

They tell us whether decisions are cost-effective

A

Psychometric Soundness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A test must be valid to be useful, but a valid test is not always a useful test, especially if testtakers do not follow test directions

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

It refers to disadvantages, losses or expenses in both economic and noneconomic terms

A

Cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

It refers to profits, gains or advantages

A

Benefit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

It is a family of techniques that entail a cost-benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment

A

Utility Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

provide an indication of likelihood that a testtaker will score within some interval of scores on a criterion measure – an
interval may be categorized as “passing”, “acceptable” or “failing”

A

Expectancy Table/Chart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

estimate of the percentage of employees hired by a particular test who will be successful to their jobs

A

Taylor-Russell Tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

used for obtaining the difference between the means of the selected and unselected groups to derive an index of what the test is
adding to already established procedure

A

Naylor-Shine Tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A formula used to calculate the dollar amount of a utility gain resulting from the
use of a particular selection instrument under specified conditions

A

Brodgen-Cronbach-Gleser Formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

an estimate of the benefit (monetary/otherwise) of using a particular
test or selection method

A

Utility gain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

a body of methods used to quantitatively evaluate selection procedures,
diagnostic classifications, therapeutic interventions or other assessment or
intervention-related procedures in terms of how optimal they are (most typically
from a cost-benefit perspective)

A

Decision Theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

a correct classification

A

hit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

a qualified driver is hired; an unqualified driver is not hired

A

It is a hit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

an incorrect classification; a mistake

A

miss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

a qualified driver is not hired; an unqualified driver is hired

A

It is a miss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

the proportion of people that an assessment tool accurately identified
as possessing a particular variable

A

hit rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

the proportion of qualified drivers with a passing score who actually
gain permanent employee status; the proportion of unqualified drivers with a
failing score who did not gain permanent status

A

This is a hit rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

the proportion of people that an assessment tool inaccurately identified
as possessing a particular variable

A

miss rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

the proportion of drivers whom inaccurately predicted to be qualified;
the proportion of drivers whom inaccurately predicted to be unqualified

A

this is a miss rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

falsely indicates that the testtaker possesses a particular variable; example: a driver who is hired is not qualified

A

false positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
falsely indicates that the testtaker does not possess a particular variable; the assessment tool says to not hire but driver would have been rated as qualified
false negative
26
Some practical considerations
The Pool of Job Applicants The Complexity of the Job The Cut Score in Use
27
a (usually numerical) reference point derived as a result of a judgment and used to divide a set of data into two or more classifications, with some action to be taken or some inference to be made on the basis of these classifications
Cut Score/Cutoff Score
28
dictate what sort of information will be required as well as the specific methods to be used
objective of utility analysis
29
Used to measure costs vs. benefits
Expectancy Data
30
- Based on norm-related considerations rather than on the relationship of test scores to a criterion - Also called norm-referenced cut score - Ex.) top 10% of test scores get A’s - normative
Relative cut score
31
- set with reference to a judgment concerning a minimum level of proficiency required to be included in a particular classification. - Also called absolute cut score - criterion
Fixed cut score
32
using two or more cut scores with reference to one predictor for the purpose of categorizing testtakers
Multiple cut scores
33
Ex.) having cut score that marks an A, B, C etc. all measuring same predictor
Multiple cut scores
34
the achievement of a particular cut score on one test is necessary in order to advance to the next stage of evaluation in the selection process
Multiple-stage or Multi Hurdle
35
written application->group interview->personal interview
Multiple-stage or Multi Hurdle
36
assumption is made that high scores on one attribute can compensate for low scores on another attribute
Compensatory model of selection
37
Who devised Angoff method?
William Angoff
38
Who devised Angoff method?
William Angoff
39
a way to set fixed cut scores that entails averaging the judgments of experts; must have high inter-rater reliability
Angoff Method
40
a system of collecting data on a predictor of interest from groups known to possess (and not to possess) a trait, attribute or ability of interest
Know Groups Method/Method of Contrasting Groups
41
a system of collecting data on a predictor of interest from groups known to possess (and not to possess) a trait, attribute or ability of interest
Know Groups Method/Method of Contrasting Groups
42
a cut score is set on the test that best discriminates the high performance from low performers
Know Groups Method/Method of Contrasting Groups
43
-in order to “pass” the test, the testtaker must answer items that are considered that has some minimum level of difficulty, which is determined by the experts and serves as the cut score
Item Response Theory (IRT)-Based Methods
44
- Based on testtaker’s performance across all items on a test - Some portion of test items must be correct
IRT Based Method
45
a technique for identifying cut scores based on the number of positions to be filled
Method of Predictive Yield
46
a family of statistical techniques used to shed light on the relationship between certain variables and two or more naturally occurring groups
Discriminant Analysis
47
determining difficulty level reflected by cut score
Item mapping method
48
test items are listed, one per page, in ascending level of difficulty. An expert places a bookmark to mark the divide which separates testtakers who have acquired minimal knowledge, skills, or abilities and those that have not. Problems include training of experts, possible floor and ceiling effects, and the optimal length of item booklets
Bookmark-method
49
Steps in Test Development
1. TEST CONCEPTUALIZATION 2. TEST CONSTRUCTION 3. TEST TRYOUT 4. ITEM ANALYSIS 5. TEST REVISION
50
Conception of idea by the test developer
Test Conceptualization
51
An emerging social phenomenon or pattern of behavior might serve as the stimulus for the development of a new test.
Test Conceptualization
52
An item for which high scorers on the test respond correctly. Low scorers respond to that same item incorrectly
Norm-referenced conceptualization
53
The conceptualization is on the construct that is need to maste
Criterion-referenced conceptualization
54
high scorers on the test get a particular item right whereas low scorers on the test get that same item wrong.
Criterion-referenced conceptualization
55
prototype of the test; necessary for research reason; but not required for teacher-made test
Pilot work
56
To know whether some items should be included in the final form of the instrument
Pilot work
57
the test developer typically attempts to determine how best to measure a targeted construct
Pilot work
58
process of setting rules for assigning numbers in measurement.
Scaling
59
credited for being the forefront of efforts to develop methodologically sound scaling methods
LL Thurstone
60
Stanine scale
Raw score converted from 1-9
61
measuring one construct
Unidimensional Scale
62
measuring more than one construc
Multidimensional Scale
63
entails judgments of a stimulus in comparison with every other stimulus on the scale (best to worst)
Comparative Scaling
64
stimuli are placed into one of two or more alternative categories that differ quantitatively with respect to some continuum (section 1, section 2, section 3)
Categorical Scaling
65
Which can be defined as a grouping of words, statements, or symbols on which judgments of the strength of a particular trait, attitude, or emotion are indicated by the testtaker
Rating Scale
66
when final score is obtained by summing the ratings across all the items
Summative Scale
67
a type of summative rating scale wherein each item presents the testtaker with five alternative responses usually on agree-disagree, or approve-disapprove continuum. It is ordinal in nature
Likert Scale
68
scaling method whereby one of a pair of stimuli (such as photos) is selected according to a rule (such as – “select the one that is more appealing”)
Paired Comparison
69
presented with two stimuli and asked to compare
Paired comparison
70
judging of a stimulus in comparison with every other stimulus on the scale
Comparative Scaling
71
testtaker places stimuli into a category; those categories differ quantitatively on a spectrum
Categorical Scaling
72
items range from sequentially weaker to stronger expressions of attitude, belief, or feeling. A testtaker who agrees with the stronger statement is assumed to also agree with the milder statements
Guttman Scale/Scalogram Analysis
73
a scale wherein items range sequentially from weaker to stronger expressions of the attitude or belief being measured
Guttman Scale/Scalogram Analysis
74
Developer of Guttman Scale/Scalogram Analysis
Louis Guttman
75
direct estimation because don’t need to transform testtaker’s response to another scale. It is presumed to be interval in nature
Thurstone’s Equal Appearing Intervals Method
76
When devising a standardized test using a multiple-choice format, it is usually advisable that the first draft contains approximately ______ the number of items that the final version of the test will contain
twice
77
What to consider in writing items
- range of content that the items should cover - which item format should be employed - written in total and for each content area covered
78
reservoir from which items will not be drawn for the final version of the test
Item pool
79
Item pool should be about _____ the number of questions as final will have
double
80
variables such as the form, plan, structure, arrangement and layout of individual test items
Item format
81
the collection of items to be further evaluated for possible selection for use in an item bank
Item pool
82
testtaker selects a response from a set of alternative responses
Selected-Response Format
83
What type of item format is multiple choice, true-false, and matching
Selected-Response Format
84
testtaker supplies or creates the correct answer
Constructed-Response Format
85
Item format that includes completion item, short answer and essay
constructed-response format
86
constructed-response format
item bank
87
interactive, computer-administered testtaking process wherein items presented to the testtaker are based in part on testtaker’s performance on previous items.
Computerized Adaptive Testing (CAT)
88
the diminished utility of an assessment tool for distinguishing testtakers at the low end of the ability, trait, or other attribute being measured
floor effect
89
diminished utility of an assessment tool for distinguishing testtakers at the high end of the ability, trait, attribute being measured
ceiling effect
90
ability of computer to tailor the content and order of presentation of test items on the basis of responses to previous items
item branching
91
testtakers earn cumulative credit with regard to a particular construct
cummulative scoring
92
testtaker responses earn credit toward placement in a particular class or category with other testtakers whose pattern of responses is presumably similar in some way
class/category scoring
93
comparing a testtaker’s score on one within a test to another scale within that same test
ipsative scoring
94
John’s need for achievement is higher than his need for affiliation
ipsative scoring
95
offers two alternatives for each item
dichotomous format
96
resembles the dichotomous format except that each item has more than two alternatives
polytomous format
97
incorrect choices in multiple choice
distractors
98
describes the chances that a low-ability test taker will obtain each score
guessing threshold
99
uses more choices than Likert; 10-point rating scale
category format
100
respondent is given a 100-millimeter line and asked to place a mark between two well-defined endpoints. It measures self-rate healt
Visual analogue scale
101
subject receives a long list of adjectives and indicates whether each one is characteristic of himself or herself
adjective scale
102
Obtained by calculating the proportion of the total number of testtakers who answered the item correctly “p”
Item-Difficulty Index
103
Higher p indicates
easier items
104
Difficulty can be replaced with _________________in non-achievement tests
endorsement
105
- Indication of the internal consistency of a test - Equal to the product of the item-score standard deviation (s) and the correlation (r) - Factor analysis and inter-item consistency
item Reliability Index
106
Statistic designed to provide an indication of the degree to which a test is measuring what it purports to measure. It requires: item-score standard deviation, the correlation between the item score and criterion score
Item-Validity Index
107
means greater number of high scorers answering the item correctly
higher d
108
means low-scoring examinees are more likely to answer the item correctly than high-scoring examinees
negative d
109
compares performance on a particular item with performance in the upper and lower regions of a distribution of continuous test scores
Item-Discrimination Index
110
Graphic representation of item difficulty and discrimination
Item-Characteristic Curves
111
techniques of data generation and analysis that rely primarily on verbal rather than mathematical or statistical procedures
Qualitative method
112
various nonstatistical procedures designed to explore how individual test items work
Qualitative item analysis
113
- approach to cognitive assessment that entails respondents vocalizing thoughts as they occur - used to shed light on the testtker’s though processes during the administration of a test
"Think aloud” test administration
114
study of test items in which they are examined for fairness to all prospective testtakers as well as for the presence of offensive language, stereotypes, or situations
Sensitivity Review
115
Find the correlation between performance on the item and performance on the total test
The Point Biserial Method
116
Correlation between a dichotomous variable and a continuous variable
point biserial correlation
117
revalidation of a test on a sample of testtakers other than those on whom test performance was originally found to be a valid predictor of some criterion
Cross-validation
118
decrease in item validities that inevitably occurs after cross-validation of finding
Validity Shrinkage
119
test validation process conducted on two or more tests using the same sample of testtakers
Co-validation
120
when co-validation is used in conjunction with the creation of norms or the revision of existing norms
Co-norming
121
test protocol scored by a highly authoritative scorer that is designed as a model for scoring and a mechanism for resolving scoring discrepancies
anchor protocol
122
a discrepancy between scoring in an anchor protocol and the scoring of another protocol
scoring drift
123
phenomenon, wherein an item functions differently in one group of testtakers as compared to another group of testtakers known to have the same level of the underlying trait
Differential item functioning (DIF)
124
(level of difficulty) optimal average item difficulty (whole test)
0.5
125
(level of difficulty) average item difficulty on individual items
0.3 to 0.8
126
(level of difficulty) true or false
0.75
127
(level of difficulty) multiple choice (4 choices)
0.625