Selection Flashcards
Definition of Selection:
A definition of selection is “a systematic process of deciding who to hire, promote or move to other jobs”. This definition indicates that selection is an important part of staffing, which is a broader field of HR, and that selection processes and decisions aren’t always about hiring new employees to the firm.
What is the ultimate purpose of any selection tool or method? What do we hope it will do?
The one thing and one thing really really well: predict future job performance.
Personality Testing
As an example of a selection method, we can consider personality testing. Personality tests are interesting, in part, because many laypeople are skeptical of them, and think of them as not very useful. Keep in mind how we should measure “usefulness”. We will discuss this shortly. Some common arguments against using personality tests are, for instance, people can fake their answers; tests are inaccurate; personality changes from situation to situation, so it can’t be useful as a way to select personnel. Are these true or false? Are tests inaccurate? If so, they wouldn’t be very useful for personnel selection because they wouldn’t predict anything, such as future job performance. In fact, none of these arguments are true. We can consider this further when we talk about personality in personnel selection, but I wanted to introduce this early in this unit to demonstrate to you that your preconceived notions about different selection methods may or may not be consistent with what the current science has to say.
Faking answers does not change validity of a personality test.
5 Evaluation Selection Method Standards
- Reliability
- Validity
- Generalizability
- Utility
- Legality
Before we talk about specific selection methods, it is important to discuss how we evaluate the extent to which a particular measure is a good measure. All personnel selection methods and, more broadly, processes should meet the criterion of legality, which is to say they should conform to prevailing laws and legislation. Although we won’t focus on this as part of this particular unit, if you are going to work in HR you should familiarize yourself with the various laws related to personnel selection, some of which we touched on in the section of Legal Issues and EEO. Selection experts also evaluate the reliability, validity, generalizability and utility of various selection methods.
Reliability
Powerpoint:
Reliability is a property of a measurement approach or selection method, which represents the extent to which that measure is free from random error.
Selection methods first need to be evaluated in terms of their reliability. If a particular method isn’t reliable, it won’t be valid and therefore won’t be very useful. We will consider why a method has to be reliable for it to be valid in a moment. But first, let’s talk about what reliability is. Most any measure that we use to represent some actual or true score, such as measuring height with a ruler for instance, will be imperfect. That is to say, for instance, different measurements will result in slightly different results. Measures almost never perfectly represent their true scores, even though their true scores are stable, such as with height. One’s height doesn’t change from month to month, at least not for adults, yet measures of highly will be slightly different depending on the measurement.
If a particular selection method has a high degree of random error, which is to say it is unreliable, it won’t be very valid, which is to say it won’t be very good at predicting anything. For instance, let’s say a measure of academic aptitude – the GRE test – isn’t very reliable, which would mean, for instance, that your score on the test changes a lot from Time 1 to Time 2, perhaps from your junior year to your senior year of college. If this were the case, this would mean that the measure represents different things at different times. If the measure represents different things at different times, so X at time 1 and then X plus or minus something at Time 2, it can be used to predict anything because the measure is unstable. Since personnel selection is all about using different methods to measure different things that are meant to predict the extent to which individuals will be successful on the job, reliability is a necessary but insufficient criterion for a method to be worthwhile in a selection process. How do we measure reliability?
Test-retest reliability
Internal consistency reliability
Test-Retest Reliability
The degree to which a measure correlates with itself at two different times, which is demonstrated in the preceding GRE example. In this case, test-retest reliability is a correlation coefficient. It is important to understand what a correlation coefficient is and the different levels of practical significance or strength of a correlation. As a brief review, correlations below .10 are trivial; .10 to .29 are small; .30 to .49 are moderate and .50 and above are considered large.
Internal Consistency Reliability
When a particular measure is based on a survey format and it has multiple survey items, such as with personality or IQ tests, reliability can also be measured based in internal consistency reliability, which is essentially the extent to which different items in the same test are consistent with one another. Although this type of reliability is not represented by a correlation coefficient, it is still a representation of the extent to which a measure is free from random error. Reliability is always about consistency – whether it be consistent of a measure across time, or consistency between different items on a test with one another. How reliable should a selection method be? There aren’t hard and fast rules for cutoff scores for the reliability of a measure. That said, reliability should, in general, be strong (.50 or above).
What is internal consistency reliability? When questions are repeated but in different words. The different wording tries to make the personality test reliable.
For a selection tool like an IQ test, personality test, work sample test, etc. all these different tools, if we want them to do their job, has to be reliable.
Explain reliability coming before validity:
If something can’t predict anything, then it’s not valid.
Validity:
Reliability is a necessary but insufficient condition for a measure to be valid—but what is validity? Validity comes in different forms, but in general it is the extent to which a measure assesses relevant – and only relevant – aspects of a particular criterion, such as job performance. In general, when we talk about validity, we are talking about the extent to which a measure measures what it is supposed to. When we talk about criterion-related validity, we are talking about the extent to which a measure measures job performance. Job performance is sort of an ultimate criterion in personnel selection. There are two broad types of validity that we should consider: content validity and criterion-related validity. Criterion-related validity comes in two forms: concurrent and predictive, each of which we will discuss.
Content Validation
A test-validation strategy performed by demonstrating that the items, questions, or problems posed by a test are a representative sample of the kinds of situations or problems that occur on the job. In other words, a test must be deemed to measure the content it is supposed to measure and only that content. Measures can be deficient or contaminated, and either would indicate a lack of content validity. If a measure is deficient, it is not measuring all of the features of the variable it is supposed to measure. So, for instance, if aptitude as measured by a GRE test is meant to measure quantitative, verbal and analytical aptitudes, but a particular test only measures quantitative and verbal aptitude, it is deficient. If a measure is contaminated, it is measuring something that is not meant to be part of the variable it is intended to measure. So, for instance, if a particular GRE test measures social skills in addition to quantitative, verbal and analytical aptitudes, then the measure would be deficient. Of course, GRE tests do not measure social skills, even though social skills are indeed important for academic performance in graduate school, which is what the measure is meant to predict. The point is that a particular test should measure only what it is intended to measure. How is content validity determined? It is determined through expert judgment. That is, a panel of subject matter experts carefully review the properties of a given test – so the questions on a personality test, for instance – to determine the extent to which they measure what they are supposed to measure.
Criterion-Related Validity - Predictive vs Concurrent
Criterion-related validity is, as it sounds, meant to demonstrate the extent to which a given measure (X) is predictive of some criterion (Y). In general, we think of job performance as the ultimate criterion in personnel selection. After all, once we have a qualified applicant pool, what we want to do then is to select the most qualified among the pool, which is to say the person or persons who will perform best on the job. There are two types of criterion-related validity: predictive and concurrent, both of which are represented by a correlation coefficient. Let’s start with concurrent because it is generally simpler than predictive. With concurrent validity, job incumbents are exposed to a particular selection method, let’s say some sort of test, and their scores are then correlated with job performance. Assuming there is some variability in both the test and the criterion, the variability in the test can be used to explain variability in the criterion. Concurrent validation is easier than predictive validation in the sense that it is less time consuming and resource intensive, but it is also has some limitations. Because only current job incumbents are exposed to the test, scores on the may be more similar across incumbents than if job applicants were tested. This is because persons who are actually selected into an organization will be more similar to one another than a group of applicants—due to factors such as socialization and training. Thus, restriction in range of scores in the test can affect the types of correlations that are observed. Predictive validity is more ideal than concurrent validation, but it is also more involved so to speak. It involves measuring all applicants (at least, all applicants at a certain stage of the selection process) on a particular test, then selecting individuals who are thought to be the most qualified and the best future performers, and then correlating scores on the test pre-hiring with job performance scores post-hiring. In this sense, scores on the test can actually be used to predict performance. This is ideal because we want to be able to say that scores on our test are not only correlated with, but that they in fact predict job performance. That said, since scores from all applicants need to be obtained, this can involve much more time and effort, and it is not often done in business organizations; it is less common the concurrent validation studies. In general, tests need to demonstrate validity for them to be used in personnel selection. If a test is not valid, it has no purpose being used in a selection process. Moreover, if a particular test results in some negative consequence, such as adverse impact, and it isn’t valid, then its use can not be defended, which we talked about briefly in the context of Legal Issues and EEO. Validities – i.e., correlations – vary across different personnel selection methods, as we will see, and there is no hard and fast rule here about the strength of a correlation needed. That said, even small correlations (.10 and above) can be useful in selection.
Lecture:
Explain concept of construct validity? Subject matter experts. If SME’s are looking at a test and they say the questions aren’t really gauging personality, they question its construct validity ? construct validity is established through content validation.
What is criteria related validity? Difficult to measure in a predictive sense. Defend tests by saying they have criteria validity.
Generalizability
The degree to which the validity of a selection method established in one context extends to other contexts.
3 contexts:
- different situations (jobs, organizations)
- different samples of people
- different time periods
Most of the time, we want the validity of a measure to be generalizable across different situations, different samples or groups of people and across time. So, for instance, if we use a personality test to predict job performance, ideally that test will predict job performance in different jobs (engineers, marketing coordinators, lawyers, so on), across different subgroups of people (across men and women, minorities and majority members, persons of different ages and so on), as well as across time. In general, the validity of most the measures we use in personnel selection are generalizable. Validities for personality tests – correlations between test scores and job performance – tend to be pretty stable over time, and they are consistent across jobs, organizations and, for the most part, across persons from different backgrounds. That said, as we will see shortly, some important personnel selection methods have different validities for different groups of people, which is not only a problem for predicting performance, but can also present legal challenges.
What is generalizability? Generalizability is the degree to which a test is useful across contexts (situations, candidates, jobs)
Utility
Utility is the degree to which the information provided by selection methods enhances the effectiveness of selecting personnel in organizations. In other words, it is the usefulness of a particular selection method. Although this is a separate and somewhat independent dimension from the other means by which we evaluate selection tools, such as reliability, the utility of a particular method is impacted by reliability, validity and generalizability. That is to say, the more reliable, valid or generalizable a particular method, the higher its utility. That said, other factors are indicative of utility. For instance, when a selection ratio is lower, the utility of a test is higher, all else equal. We want to have a large proportion of applicants relative to the number of persons we are selecting. Other factors such as the cost of a test, the extent to which the test is related to turnover and so on are all considered in relation to the utility of a particular test or measure.
Lecture:
What is utility? Usefulness. Don’t consider only for reliability, but also for generalizability. IQ highly relevant across jobs, but not persons. IQ causes disparate impact. People of color perform more poorly. Cannot cheat on an IQ test.
Types of Selection Methods
Types of selection methods used to assess a person for employment include:
- interviews
- honesty tests and drug tests
- work samples
- personality inventories
- cognitive ability tests
We don’t have time to consider every possible personnel selection method, but let’s consider some of the more widely used methods in a bit more detail. In general, there are selection methods that tend to occur in initial stages of selection, and those that tend to occur in later stages of selection, as well as those that occur during a final screening. Evaluation of resumes to determine candidates’ fit with a position occurs early in the selection process. Related to this, the use of biographical data and different types of tests tend to occur relatively early in the selection process – they can be used to screen out candidates, and they are relatively faster and cheaper than other selection methods. Other methods, such as interviews, occur later in the selection process once candidates have been carefully screened. Reference checks and drug testing tend to occur in final screening of applicants.
Resume Screening
Perhaps the first step in a selection process is for a qualified HR professional, such as an HR coordinator, to screen resumes so as to sort candidates into two “piles” so to speak– those that meet the basic qualifications for a job, or the job req’s, and those that do not. Those that do will be moved on to other stages of the selection process. Some tips from the book “Hiring Great People” are offered above for persons to consider when screening resumes, such as being “open” and avoiding discriminatory information. Existing research, including research that I have conducted myself, shows that decision makers, such as hiring managers, will make hiring decisions based on candidates’ demographic information, such as gender, sexual orientation and so on – so it is important that persons screening resumes be trained professionals.
Biographical Data (Bio-data)
The basic premise behind biodata is that past behavior is a good (maybe the best) predictor of future behavior, a principle of behavioral consistency. In general, questions developed by subject matter experts about situations that are likely to have occurred in one’s past – and how one behaved in these situations – are used to predict job performance. Biodata measures tend to have 10 to 30 items; a single biodata question will not be a good predictor of future performance.
Biographical data from applicants (hobbies, experiences in school, preferred supervisor) learned through questionnaire or from a resume. Information is compared to information from firm’s successful employees. For instance, potential leadership ability could be predicted by previous leadership experience. Must ensure questions are job related and attempt to verify information provided. The biggest concern with the use of biographical data is that applicants who supply the information may be motivated to misrepresent themselves. Thus, it is important to control distortion by, for instance, warning about including a lie detection scale and/or asking applicants to elaborate on their answers. In general, validities tend to be practically significant, in the moderate range (around .30). Biodata need to always be job related (meaning they are related to job content and/or they predict job performance) and should be designed to not unfairly discriminate against protected classes, i.e., cause disparate impact.
Biodata items require extensive time and effort in terms of development, but they are relatively inexpensive to administer and score (often automated scoring procedures are used).