Utility Flashcards
Usefulness or practical value of testing to improve efficiency
Usefulness or practical value of
a training program or intervention
Ex: Creating test data, cleaning up resources, or performing assertions
Test utility
Utility
Reliability and validity of a test.
Psychometric soundness
The higher the criterion-related validity of test scores for
making a particular decision, the higher the utility of the test is likely to be. However, there are exceptions to this general rule because many factors may enter into an estimate of a test’s utility. There are also great variations in the ways in which the utility of a test is determined.
Ex: test might be a valid predictor of future job performance, but it has no utility if every applicant is going to be hired regardless of test results
Refers to disadvantages, losses, or expenses in both economic and noneconomic terms.
Ex: the expenses of developing, administering, and scoring the test, as well as the cost of lost productivity or missed opportunities if the test is not used effectively
Costs
If testing is to be conducted, then it may be necessary to allocate funds to purchase:
(1) a particular test
(2) a supply of blank test protocols
(3) computerized test processing,
scoring, and interpretation
from the test publisher or some independent service.
Refers to profits, gains, or
advantages.
Ex: improved software quality, faster testing cycles, reduced costs, and increased team productivity.
Benefits
In industrial settings, a partial list of such noneconomic
benefits—many carrying with them economic benefits as well—would include:
■ an increase in the quality of workers’ performance;
■ an increase in the quantity of workers’ performance;
■ a decrease in the time needed to train workers;
■ a reduction in the number of accidents;
■ a reduction in worker turnover.
Family of techniques that entail a cost–benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment.
Utility Analysis
If undertaken to evaluate a test, the utility analysis will help make decisions regarding whether:
■ one test is preferable to another test for use for a specific purpose;
■ one tool of assessment (such as a test) is preferable to another tool of assessment (such
as behavioral observation) for a specific purpose;
■ the addition of one or more tests (or other tools of assessment) to one or more tests (or
other tools of assessment) that are already in use is preferable for a specific purpose;
■ no testing or assessment is preferable to any testing or assessment.
If undertaken for the purpose of evaluating a training program or intervention, the utility
analysis will help make decisions regarding whether:
■ one training program is preferable to another training program;
■ one method of intervention is preferable to another method of intervention;
■ the addition or subtraction of elements to an existing training program improves the
overall training program by making it more effective and efficient;
■ the addition or subtraction of elements to an existing method of intervention improves
the overall intervention by making it more effective and efficient;
■ no training program is preferable to a given training program;
■ no intervention is preferable to a given intervention.
Indication of the likelihood that a testtaker will score within some interval of scores on a criterion measure—an interval that may be categorized as “passing,” “acceptable,” or “failing.”
Ex: with regard to the utility of a new and
experimental personnel test in a corporate setting, an expectancy table can provide vital
information to decision-makers. An expectancy table might indicate, for example, that the
higher a worker’s score is on this new test, the greater the probability that the worker will be
judged successful.
Expectancy table
Provide an estimate of the extent to which inclusion of a particular test in the selection system will improve selection.
Ex: tables provide an estimate of the percentage of employees hired by the use of a particular test who will be successful at their jobs, given different combinations of three variables: the test’s validity, the selection ratio used, and the base rate.
Taylor-Russell tables
Taylor-Russell tables 3 variables
Test’s validity
Selection ratio used
Base rate
Numerical value that reflects the relationship between the number of people to be hired and the number of people available to be hired.
Ex: if there are 50 positions
and 100 applicants, then the selection ratio is 50/100, or .50.
Selection ratio
Refers to the percentage of people hired under the existing system for a particular position.
Ex: firm employs 25 computer programmers and 20 are considered successful, the base
rate would be .80.
Base rate
Entails obtaining the difference between the means of the selected and unselected groups to derive an index of what the test (or some other tool of assessment) is adding to already established procedures.
Naylor-Shine tables
Used to calculate the dollar amount of a utility gain resulting from the use of a particular selection instrument under specified conditions.
BCG formula exists for
researchers who prefer their findings in terms of productivity
gains rather than financial ones.
Brogden-Cronbach-Gleser formula
Refers to an estimate of the benefit (monetary or otherwise) of using a
particular test or selection method.
Ex: Improved Selection Decisions: A test that accurately predicts job performance leads to better hiring decisions, resulting in higher-performing employees and reduced turnover costs.
Utility gain
Refers to an estimated increase in work output.
Productivity gain
(1) a classification of decision problems
(2) various selection strategies ranging from single-stage processes to sequential analyses
(3) a quantitative analysis of the relationship between test utility, the selection ratio, cost of the testing program, and expected value of the outcome
(4) a recommendation that in some instances job requirements be tailored to the applicant’s ability instead of the
other way around
Adaptive Treatment
Cross-disciplinary field that examines how individuals, groups, and systems make choices under conditions of uncertainty or complexity.
Ex: A patient facing a difficult treatment choice (e.g., surgery vs. medication) might be assessed using decision theory to understand their risk tolerance, understanding of probabilities, and how they weigh potential benefits and drawbacks.
Decision Theory
Some practical considerations in utility analysis (3)
The pool of job applicants
The complexity of the job
The cut score in use
For example, some utility estimates are based on the assumption that there will be a ready supply of viable applicants from which to choose and fill positions. Perhaps for some types of jobs and in some economic climates that is, indeed, the case. There are certain jobs, however, that require such unique skills or demand such great sacrifice that there are relatively few people who would even apply, let alone be selected.
Also, the pool of possible job applicants for a particular type of position may vary with the economic climate. It may be that in periods of high unemployment there are significantly more people in the pool of possible job applicants than in periods of high employment.
Closely related to issues concerning the available pool of job applicants is the issue of how many people would actually accept the employment position offered to them even if they were found to be a qualified candidate.
Pool of applicants example
The more complex the job, the more people differ on how well or poorly they do that job.
The complexity of the job
Reference point derived as a result of a judgment and used to divide a set of data into two or more classifications, with some action to be taken or some inference to be made on the basis of these classifications.
Cut score
Defined as a reference point—in a distribution of test scores used to divide a set of data into two or more classifications—that is set based on norm-related considerations rather than on the relationship of test scores to a criterion.
Also called as norm-referenced cut score
Ex: As an example of a relative cut score, envision your instructor announcing on the first day of class that, for each of the four examinations to come, the top 10% of all scores on each test would receive the grade of A. In other words, the cut score in use would depend on the performance of the class as a whole.
Relative cut score
Reference point—in a distribution of test scores used to divide a set of data into two or more classifications—that is typically set with reference to a judgment concerning a minimum level of proficiency required to be included in a particular classification.
Also called as absolute cut score
Ex: An example of a fixed cut score might be the score achieved on the road test for a driver’s license. Here the performance of other would-be drivers has no bearing upon whether an individual testtaker is classified as “licensed” or “not licensed.” All that really matters here is the examiner’s answer to this question: “Is this driver able to meet (or exceed) the fixed and absolute score on the road test necessary to be licensed?”
Fixed cut score
Use of two or more cut scores with reference to one predictor for the purpose of categorizing testtakers.
Ex: So, for example, your instructor may have multiple cut scores in place every time an examination is administered, and each class member will be assigned to one category (e.g., A, B, C, D, or F) on the basis of scores on that examination. That is, meeting or exceeding one cut score will
result in an A for the examination, meeting or exceeding another cut score will result in a B for the examination, and so forth.
Multiple cut scores
One collective element of a multistage decision-making process in which the achievement of a particular cut score on one test is necessary in order to advance to the next stage of evaluation in the selection process.
Ex: In applying to colleges or professional schools, for example, applicants may have to successfully meet some standard in order to move to the next stage in a series of stages. The process might begin, for example, with the written application stage in which individuals who turn in
incomplete applications are eliminated from further consideration. This stage is followed by what might be termed an additional
materials stage in which individuals with low test scores, GPAs, or poor letters of recommendation are eliminated. The final stage in the
process might be a personal interview stage.
Multiple hurdles
Assumption is made that high scores on one attribute can, in fact, “balance out” or compensate for low scores
on another attribute.
Ex: For example, a safe driving history may be weighted higher in the selection formula than is customer service. This weighting might be based on a company-wide “safety first” ethic. It may also be based on a company belief that skill in driving safely is less amenable to education and training than skill in customer service. The total score on all of the predictors will be used to make the decision to select
or reject.
Statistical tool that is ideally suited for making such selection decisions within the framework of a compensatory model
Multiple Regression
4 types of methods for setting cut scores
The angoff method
The known groups method
IRT-based methods
Other methods
Setting fixed cut scores can be applied to personnel selection tasks as well as to questions regarding the presence or absence of a particular trait, attribute, or ability.
Ex: In a test for a specific skill, SMEs might estimate the percentage of minimally competent examinees who would correctly identify a specific type of psychological disorder from a list of symptoms.
Angoff method
Also referred to as methods of contrasting groups
Entails collection of data on the predictor of interest from groups known to possess, and not to possess, a trait, attribute, or ability of interest. Based on an analysis of this data, a cut score is set on the test that best discriminates the two groups’ test performance.
Ex: An intelligence test should show higher scores in a group of people with higher IQs compared to a group with lower IQs
The known groups method
Cut scores are typically set based on tessttakers’ performance across all the items on the test; some portion of the total number of items on the test must be scored “correct” (or in a way that indicates the testtaker possesses the target trait or attribute) in order for the testtaker to “pass” the test (or be deemed to possess the targeted trait or attribute).
Ex: Measuring Depression: IRT can be used to measure the probability of a patient responding “yes” or “no” to a question based on their level of depression
IRT-based methods
It entails the arrangement of items in a histogram, with each column in the histogram containing items deemed to be of equivalent value.
Ex: Item mapping can be used to set standards for certification examinations, where experts determine the difficulty of items and the minimum passing score
Item-mapping method
Use of this method begins with the training of experts with regard to the minimal knowledge, skills, and/or abilities that testtakers should possess in order to “pass.” Subsequent to this training, the experts are given a book of items, with one item printed per page, such that items are arranged in an ascending order of difficulty
Ex: Imagine a test with four competency levels. SMEs would place bookmarks between items to indicate where a minimally competent candidate for the first level would transition to the second level, the second to the third, and so on.
Bookmark method
Technique for setting cut scores which took into account the number of positions to be filled, projections regarding the likelihood of offer acceptance, and the distribution of applicant scores.
Ex: A test designed to predict job performance can be considered to have predictive validity if it accurately identifies applicants who will perform well on the job after a certain period (e.g., one year)
Method of predictive yield
These techniques are typically used to shed light on the relationship between identified variables (such as scores on a battery of tests) and two (and in some cases more) naturally occurring groups (such as persons judged to be successful at a job and persons judged to be unsuccessful at a job).
Ex: It can help identify specific cognitive strengths and weaknesses that characterize intellectually gifted individuals
Discriminant analysis