Chap 6: EVALUATING SELECTION TECHNIQUES AND DECISIONS Flashcards
The extent to which a score from a test or from an evaluation is consistent and free from error.
Reliability
method each one of several people take the same test
twice.
test-retest reliability
The scores from the first administration of the test are correlated with scores from the second to determine whether they are similar
test-retest reliability
The extent to which repeated administration of the same test will achieve similar results.
test-retest reliability
The test scores are stable across time and not highly susceptible to such random daily conditions as illness, fatigue, stress, or uncomfortable testing conditions
temporal stability
The consistency of test scores
across time.
temporal stability
two forms of the same test are constructed
alternate-forms reliability
designed to eliminate any effects that taking one form of the test first may have on scores on the second form.
counterbalancing
The extent to which two forms of the same test are similar
alternate-forms reliability
A method of controlling for order
effects by giving half of a sample Test A first, followed by Test B, and giving the other half of the sample Test B first, followed by Test A
counterbalancing
The extent to which the scores on two forms of a test are similar.
Form stability
consistency with which an applicant responds to items measuring a similar dimension or construct
Internal Reliability
The extent to which similar items are answered in similar ways is referred to as internal consistency and measures ______
item stability
The extent to which similar items are answered in similar ways is referred to as _____ and measures item stability
internal consistency
The extent to which test items measure the same construct.
Item homogeneity
3 statistics used to determine internal reliability of test
-Kuder-Richardson 20
-Spearman-Brown Prophecy Formula
-Coefficient Alpha (Cronbach’s Alpha)
- A form of internal reliability in which the consistency of item responses is determined by comparing scores on half of the items with scores on the other half of the items.
- is the easiest to use, as items on a test are split into two groups.
Split-half method
are more popular and accurate methods of determining internal reliability, although they are more complicated to compute
Cronbach’s coefficient alpha and the K-R 20
Used to correct reliability coefficients resulting from the split-half method.
Spearman-Brown prophecy formula
A statistic used to determine internal reliability of tests that use
interval or ratio scales.
Coefficient alpha
A statistic used to determine
internal reliability of tests that use items with dichotomous answers (yes/no, true/false).
Kuder-Richardson Formula 20 (K-R 20)
used for tests containing dichotomous items (e.g., yes-no,
true-false)
K-R 20
can be used not only for dichotomous items but also for tests containing interval and ratio(nondichotomous) items such as five-point rating scales
coefficient alpha
The extent to which two people scoring a test agree on the test score, or the extent to which a test is scored correctly.
Scorer reliability
an issue in projective or subjective tests in which there is no one correct answer, but even tests scored with the use of keys suffer from scorer mistakes
Scorer reliability
When human judgment of performance is involved, scorer reliability is discussed in terms of
interrater reliability
when Evaluating the Reliability of a Test, two factors must be considered:
- the magnitude of the reliability coefficient
- the people who will be taking the test.
The degree to which inferences from test scores are justified by the evidence.
Validity
The extent to which tests or test items sample the content that they are supposed to measure.
Content validity
In industry, the appropriate content for a test or test battery is determined by the _______
job analysis
The extent to which a test score is related to some measure of job
performance.
Criterion validity
- a test is given to a group of employees who are already on the job.
- A form of criterion validity that correlates test scores with measures of job performance for employees currently working for an organization
concurrent validity
- design, the test is administered to a group of job applicants who are going to be hired
- A form of criterion validity in which test scores of applicants are compared at a later date with a measure of job performance.
predictive validity
Difference between concurrent and predictive validity
Concurrent - already on the job.
Predictive - applicants who are going to be hired
performance scores makes obtaining a significant validity coefficient more difficult
restricted range
the extent to which a test found valid for a job in one location is valid for the same job in a different location
validity generalization (VG)
it is the most theoretical of the validity types
Construct validity
The extent to which a test actually measures the construct that it purports to measure.
Construct validity
is concerned with inferences about test scores
Construct validity
is concerned with inferences about test construction.
content validity
is usually determined by correlating scores on a test with scores from other tests
Construct validity
A form of validity in which test scores from two contrasting groups “known” to differ on a construct are compared.
Known-group validity
is the extent to which a test appears to be job related.
Face validity
True or False
face-valid tests resulted in high levels of test-taking motivation, which in turn resulted in higher levels of test performance
true
statements that are so general that they can be true of almost anyone.
Barnum Statements
- A book containing information about the reliability and validity of various psychological tests.
- which contains information on over 2,700 psychological tests as well as reviews by test experts.
Mental Measurements Yearbook (MMY)
what edition of Mental Measurements Yearbook (MMY) is used
19th edition
A type of test taken on a computer in which the computer adapts the difficulty level of questions asked to the test taker’s success in answering previous questions
Computer-adaptive testing (CAT)
designed to estimate the percentage of future employees who will be successful on the job if an organization uses a particular test
Taylor-Russell tables
Three information needed by Taylor-Russell Tables
- criterion validity coefficient
- Selection ratio (The percentage of applicants an organization hires. hired over applicants)
- Base rate (Percentage of current employees who are considered successful.)
- A utility method that compares the percentage of times a selection decision was accurate with the percentage of successful employees.
- easier to do but less accurate than the Taylor-Russell tables
Proportion of correct decisions (HIT RATE)
five items of information must be known in Brogden-Cronbach-Gleser Utility Formula
- Number of employees hired per year
- Average tenure
- Test validity
-Standard deviation of performance in dollars - Mean standardized predictor score of selected applicants
- One form of predictive bias
- meaning that the test will significantly predict performance for one group and not others.
single-group validity
applicants are rank-ordered on the basis of their test scores
top-down selection
names of the top three scorers are given to the person making the hiring decision
- often used in public sector
rule of three
- are a means for reducing adverse impact and increasing flexibility
- The minimum test score that an applicant must achieve to be considered for hire.
Passing scores
A selection strategy in which applicants must meet or exceed the passing score on more than one selection test
Multiple-cutoff approach
Selection practice of administering one test at a time so that applicants must pass that test before being allowed to take the next test.
Multiple-hurdle approach
- As a compromise between top down hiring and passing scores, _______ attempts to hire the top test scorers while still allowing some flexibility for affirmative action
- A statistical technique based on the standard error of measurement that allows similar test scores to be grouped
banding
The number of points that a test score could be off due to test unreliability.
Standard error of measurement (SEM)