DONE: Selection Flashcards
Aiden & Hanges (2017)
Data from selection assessments can be judgmental (e.g., interviews) or mechanical (e.g., written tests), or both combined.
Cascio & Aguinis (2019)
Important to consider both the values of selections as well as weight costs (e.g., worktime lost due to being allocated to selection, potential of using a poor assessment such as informal interviews).
Martin et al. (2019)
Avoiding race-bias from cog-abil tests by using WM measures.
The authors describe the development of Global Adaptive Memory
Evaluation (G.A.M.E.) – a working memory assessment – along with three studies focused on refining and
validating G.A.M.E., including examining test-taker reactions, reliability, subgroup differences, construct and
criterion-related validity, and measurement equivalence across computer and mobile devices.
Findings – Evidence suggests that G.A.M.E. is a reliable and valid tool for employee selection. G.A.M.E.
exhibited convergent validity with other cognitive assessments, predicted job performance, yielded smaller
subgroup differences than traditional cognitive ability tests, was engaging for test-takers, and upheld
equivalent measurement across computers and mobile devices.
First, many cognitive ability tests predated or were developed
with little regard to the well-accepted Cattell–Horn–Carroll (CHC) theory of cognitive
abilities, focusing instead on learned knowledge (Gc), which does not necessarily generalize
and is sensitive to socioeconomic status.
Second, many cognitive ability tests cannot be administered flexibly across modalities
(e.g. computer, mobile phone). Test-takers tend to score lower on traditional cognitive ability
assessments when using mobile devices (Impelman, 2013) due to greater distractions (Morelli et al., 2014).
We next examined mean score differences and score distribution properties between
majority and minority groups. Observed subgroup differences for gender (Cohen’s
d ¼ −0.36, favoring males) and race/ethnicity (|d|s ranging from 0.09–0.38) were small, and
were moderate for age (r ¼ −0.25 and d ¼ −0.63, favoring test-takers younger than
40 years)[1]. Importantly, White-Black (d ¼ −0.38, favoring Whites) and White-Hispanic
differences (d ¼ 0.09, favoring Hispanics) were substantially smaller than the effect sizes
of 1.00 and 0.72, respectively, typically found for cognitive ability tests (Roth et al., 2001).
G.A.M.E. demonstrated incremental validity in
predicting performance ratings over a custom composite of ADEPT-15 personality
dimensions that was tailored to each role.
Only about half had validity coefficients over .2 and about half between .1 and .2. Notably lower than validity coefficient of .3 for CA.
Pyburn et al. (2008)
“diversity-validity tradeoff dilemma”
Bosco et al. (2015) b
We tested if h executive attention (EA) and GMA predict simulation performance and supervisory
ratings of performance.
and how much EA and GMA are associated with subgroup differences.
Results indicate
that, like GMA, EA positively predicts managerial simulation and supervisory ratings of performance. In addition, although reaching statistical
significance in only 1 of our 4 studies, EA was generally associated with
smaller subgroup differences than GMA, and meta-analysis across our
samples supports this reduced subgroup difference.
key attribute of EA is that, unlike GMA, measures of EA are relatively uninfluenced by learned knowledge (Kyllonen, 2002).
Employers
have tried to reduce differences in hiring rates through a variety of strategies—adjusting test scores
to minimize between-group differences in scores, assigning more weight to predictors associated
with less adverse impact, and using more non-cognitive selection methods—but none of these
strategies have been really effective.
Put differently, attention refers
to a state in which cognitive representations (e.g., goals, chunks of information) are held active and ready for processing - an underlying ability
to manage cognitive representations (i.e., information, goals) in temporary storage.
we observed comparable
validity coefficients for the EA–performance and GMA–performance
relations, which then provided nearly identical validity coefficients, corrected and
uncorrected, for EA and GMA.
though score adjustments are illegal!
which then provided nearly identical validity coefficients, corrected and
uncorrected, for EA and GMA.
Subgroup diffs: GMA usually resulted in the standard 1.0 SD diff. but EA tended give a Cohen’s d value of ~.85
*In 3/4 studies, this did not result in a statistically significant subgroup difference.
And comparable validiites:
Predicting supervisor rated perf, each being about .2
and each being about .4 in predicting simulation performance.
Hunter et al. (2016)
This paper is an update of Schmidt and Hunter (1998), which summarized 85 years of
research findings on the validity of job selection methods up to 1998.
this paper presents the validity of 31
procedures for predicting job performance and the validity of paired combinations of general
mental ability (GMA) and the 29 other selection procedures. Similar analyses are presented for
16 predictors of performance in job training programs. Overall, the two combinations with the
highest multivariate validity and utility for predicting job performance were GMA plus an
integrity test (mean validity of .78) and GMA plus a structured interview (mean validity of .76)
Similar results were obtained for these two combinations in the prediction of performance in job
training programs. A further advantage of these two combinations is that they can be used for
both entry level hiring and selection of experienced job applicants.
During this time, a new and
more accurate procedure for correcting for the downward bias caused by range restriction has
become available (Hunter, Schmidt, & Le, 2006). This more accurate procedure has revealed that
the older, less accurate procedure had substantially underestimated the validity of general mental
ability (GMA) and specific cognitive aptitudes (e.g., verbal ability, quantitative ability, etc.;
Schmidt, Oh, & Le, 2006)
For example, an expanded meta-analysis shows that job sample or work
sample tests are somewhat less valid than had been indicated by the older data. Also, metaanalytic results are now available for some newer predictors not included in the 1998 article.
These include Situational Judgment Tests (SJTs), college and graduate school grade point
average (GPA), phone-based structured employment interviews, measures of “emotional
intelligence”, person-job fit measures, person-organization fit measures, and self-report measures
of the Big Five personality traits.
Results show that many procedures that are valid predictors of job performance
nevertheless have little or no incremental validity over that of GMA. The rank order for zero
order validity is different from the rank order for incremental validity.
Also, the incremental
validity of most procedures is smaller than reported in Schmidt and Hunter (1998). This
reduction in apparent incremental validity results from the increase in the estimated validity of
GMA resulting from use of the more accurate correction for range restriction.
The validity of a hiring method is a direct determinant of its practical value, but it is not
the only determinant. Another direct determinant is the variability of job performance. At one
extreme, if variability were zero, then all applicants would have exactly the same level of later
job performance if hired.
At the other extreme, if performance variability is very large, it then becomes
important to hire the best performing applicants and the practical utility of valid selection
methods is very large. As it happens, this “extreme” case appears to be the reality for most jobs.
This latter variability is called the applicant pool variability, and in
hiring this is the variability that operates to determine practical value.
Another determinant of the practical value of selection methods is the selection ratio—the
proportion of applicants who are hired. At one extreme, if an organization must hire all who
apply for the job, no hiring procedure has any practical value. At the other extreme, if the
organization has the luxury of hiring only the top scoring 1%, the practical value of gains from
selection per person hired will be extremely large. But few organizations can afford to reject
99% of all job applicants.
Actual selection ratios are typically in the .30 to .70 range, a range that
still produces substantial practical utility.
Morelli et al. (2014)
Test-takers tend to score lower on traditional cognitive ability
assessments when using mobile devices due to greater distractions (Morelli et al., 2014)
Hunter et al. (2006)
After applying the new range correction procedure, found that GMA validity
ranged from .74 for professional and managerial jobs down to .39 for unskilled jobs. The mean
validity for medium complexity jobs (62% of all jobs in the U.S.) was .66.
The medium complexity category includes skilled blue collar jobs and mid-levelwhite collar jobs, such as upper level clerical and mid to lower level administrative and
managerial jobs.
(Schmidt et al., 1979).
The meanings of validity
The predictive validity coefficient is directly proportional to the practical economic
value (utility) of the assessment method.
For differential validity per say, the general finding has been that validities (the focus of this
study) do not differ appreciably for different subgroups.
That is, given similar scores on selection procedures, later job
performance is similar regardless of group membership and regardless of how job performance is
measured (objectively or via supervisor ratings).
On other selection
procedures (in particular, personality and integrity measures), subgroup differences are rare or
nonexistent.
Hunter et al. (1990)
Use of hiring methods with increased predictive validity leads to substantial
increases in employee performance as measured in percentage increases in output, increased
monetary value of output, and increased learning of job-related skills.
Research has shown that the variability of performance and output among (incumbent) workers
is very large and that it would be even larger if all job applicants were hired or if job applicants were selected randomly from among those that apply.
Employee output can also be measured as a percentage of mean output; that is, each
employee’s output is divided by the output of workers at the 50th percentile and then multiplied
by 100. Research shows that the standard deviation of output as a percentage of average output
varies by job level. For unskilled and semi-skilled jobs, the average figure is
19%. For skilled work, it is 32%, and for managerial and professional jobs, it is 48%.
Hunter et al. (2016) on The variability of employee job performance can be measured in a number of ways, but
two scales have typically been used: dollar value of output and output as a percentage of mean
output
The variability of employee job performance can be measured in a number of ways, but
two scales have typically been used: dollar value of output and output as a percentage of mean
output. The standard deviation across individuals of the dollar value of output has been found to be at minimum 40% of the mean salary of the job (Schmidt & Hunter, 1983;
Schmidt et al., 1979; Schmidt, Mack, & Hunter, 1984). The 40% figure is a lower bound value;
actual values are typically considerably higher. Thus, if the average salary for a job is $40,000,
then 1SD is at least $16,000. If performance has a normal distribution, then workers at the 84th
percentile produce output worth $16,000 more per year than average workers (i.e., those at the
50th percentile). And the difference between workers at the 16th percentile (“below average”
workers) and those at the 84th percentile (“superior” workers) is twice that: $32,000 per year.
Employee output can also be measured as a percentage of mean output; that is, each
employee’s output is divided by the output of workers at the 50th percentile and then multiplied
by 100.
a superior
worker in a lower level job produces 19% more output than an average worker, a superior skilled
worker produces 32% more output than the average skilled worker, and a superior manager or
professional produces output 48% above the average for those jobs.
Schmidt et al. (2016) - History of “the theory of situational specificity of validity”
However, as early as the 1920s it became apparent that different
studies conducted on the same assessment procedure did not appear to agree in their results.
Validity estimates for the same method and same job were quite different for different studies.
During the 1930s and 1940s the belief developed that this state of affairs resulted from subtle
differences between jobs that were difficult or impossible for job analysts and job analysis
methodology to detect. That is, researchers concluded that the validity of a given procedure
really was different in different settings for what appeared to be basically the same job, and that
the conflicting findings in validity studies were just reflecting this fact of reality.
This belief, called the theory of situational specificity of validity, remained dominant in
personnel psychology until the late 1970s when it was discovered that most of the differences
across studies were due to statistical and measurement artifacts and not to real differences in the
jobs (Schmidt & Hunter, 1977).
The largest of these
artifacts was simple sampling error variation, caused by the use of small samples in the studies.
Studies based on meta-analysis provided more accurate estimates of the average
operational validity and showed that the level of real variability of validities was usually quite
small and might in fact be zero (Schmidt, 1992)
In addition, the findings
indicated that the variability of validity was not only small or zero across settings for the same
type of job, but was also small across different kinds of jobs of similar complexity (Hunter,
1980
Neter & Ben-Shakar (1989) - on handwriting validity!
When the writers/participants are required to
copy the same material from a book to create their handwriting sample, the evidence indicates
that neither graphologists nor non-graphologists can infer any valid information about personality traits or job performance from the handwriting samples.
So it’s not in the writing as much as things like: style of writing, range of vocabulary, expression of emotions, verbal fluency, grammatical skills,
and general knowledge.
Hunter et al. (2016)
cost of application, best predictor of job learning.
When some selection measures can’t be used.
Age and performance
First, it has the
highest validity and lowest application cost.
Like work sample measures, job
knowledge tests cannot be used to evaluate and hire inexperienced workers.
Adverse impact
Executives who prefer to be judged based on accomplishments.
Table 1 shows that age of job applicants shows no validity for predicting job. Age is about as totally unrelated to job performance as
any measure can be.
Schmidt et al. (2016)
Juicy points
- crazy and in congrats to Bartram’s (2005) eight great comptencies analogue
validity of GMA is higher than that of specific aptitudes—even when specific aptitudes are
chosen to match the most important aspects of job performance (i.e., spatial perception for
mechanical jobs; cf. Schmidt, 2011).
from a purely economic standpoint, research shows that the value of the
increases in job performance from good selection practices overshadows any potential costs
stemming from defending against such suits. Thus, there is little legal risk stemming from the use
of GMA assessments. Thus, there is little legal risk stemming from the use
of GMA assessment.
In court, when defending, it’s now more ccommon to use general findings (e.g., Schmidt et al., 1998) -Summary of 100 years & metas
Such
demonstrations rely are increasingly based on summaries of the kinds of research findings.