Test Bias & Test Utility Flashcards
If we discover that one group of people score higher on a test than another group of people, what are the possible underlying reasons for this?
The test may be biased or the two groups may actually be different
If our test is biased then what are our options for dealing with this?
Have an accommodation within existing tests (e.g. extra time, point correction); redevelop the test; develop an alternative special test for the group; avoid testing altogether (but the alternatives may not be less biased)
What are some issues with alternative tests like the BITCH (black intelligence test for cultural homogeneity) and Chitling tests?
They may not exhibit psychometric validity; tend to yield much higher scores for groups at which they’re directed, but haven’t demonstrated predictive validity (e.g. not predictive of real world outcomes, like job performance or academic achievement)
Describe how we can use regression lines to model test bias
We can look at regression lines for criterion validity scatterplots separately for different groups; e.g. look at the relationship between how well people are predicted to do at a job (test score) and how well they actually do (job performance)
If all of group A are scoring higher on both test score and job performance than group B, what does this mean?
Either the test isn’t biased, or both the job performance measure and the test score are equivalently biased (discriminating against group B)
When does Intercept Bias occur, and what does it mean?
When the slope of the regression lines of the groups are the same, but they intercept the vertical axis at different places; test is biased between one group but the test is still equally predictive of job performance for both groups (although they can’t be compared)
When does Slope Bias occur, and what does it mean?
When the slope of the regression lines of the groups are different; test is biased against one group and is also less predictive of job performance for them; test is differentially valid for the 2 groups
What does “differentially valid” mean?
When the scatter of points is greater for one of the groups, so there’s a lower correlation between job performance and test score
Some people have argued that the construct of race is primarily social and has no biological meaning (especially in societies like the US). What argument has been made for the idea that race differences in intelligence test scores are due to genetics?
Black children raised in white families with white education tend to still do worse at school and score lower on IQ tests (shows innate race differences)
What arguments have been made against the idea that race differences in intelligence scores are due to genetics?
School achievement and even IQ is partly a function of teacher expectation (e.g. Pygmalion effect); A minority group member may still face many disadvantages in such a situation, especially where group membership may be superficially obvious (e.g. race)
Describe the Pygmalion Effect found in Rosenthal and Jacobson’s experiment
School children were given a non-verbal IQ test; teachers were given a list of children who performed in the top 20% and were identified as bloomers (actually chosen at random); for earliest grades, bloomers scored significantly higher at the end of the year
Steele and Aronson had students complete the Graduate Record Examination and divided them into 2 groups. How did they demonstrate the effect of self-stereotyping?
Group 1 were told the test measured intellectual ability and group 2 were told it was about problem-solving; African-Americans did worse than white Americans when in group 1 but didn’t differ in group 2
Describe the experiment that Shih et al. carried out to demonstrate the effect of self-stereotyping with Asian American women
They gave them a maths test; when previously given a questionnaire relating to racial identity they did better on the maths test than controls, but performed worse when previously completing a questionnaire relating to gender
List the Commonwealth of Australia Acts that impact the use of psychological tests for employment purposes
Racial discrimination act; age discrimination act; human rights and equal opportunity commission act; sex discrimination act; disability discrimination act; fair work act
According to the Disability Discrimination Act, to overcome a claim of discrimination, what must the deficit be directly tied to?
An inherent requirement of the job
What disabilities does the Disability Discrimination Act include?
Physical, intellectual, psychiatric, sensory, neurological, learning disabilities, physical disfigurement, and the presence of disease-causing organisms in the body
Describe the requirements for psychological testing in relation to employment under Australian law
All tests must measure the person for the inherent requirements of the job, not the person in the abstract (content and criterion validity needed); aptitude and personality tests must relate directly to the genuine requirements of the job; should only be used to assess an applicant’s suitability for the position based on selection criteria; other information (e.g. personality/private life) shouldn’t be used when deciding about their suitability
The only Australian case that discussed personality tests in the context of employment law, was between Australian Industrial Relations Commission vs. Coms21 (1999). What occurred?
Employees from a consulting firm were unfairly dismissed based on a personality test, and an inaccurate consultant’s report without regard to objective, reasonable and justifiable selection criteria (no associated skill or competency tests were used)
Discuss the issues that have arisen when the use of psychological testing has been taken to court on the grounds of bias
Overrepresentation of mentally-retarded children with Spanish surnames; IQ tests used to place children in educable mentally retarded classes (disproportionate effect on black children); When WISC-R and Stanford-Binet were argued to be racially biased, judge ruled evidence as unconvincing; Inconsistencies in court decisions are commonplace; validity/reliability of many psychological tests currently being decided by the courts
What does Test Utility refer to, and why is it important?
The practical usefulness of a test; it’s not enough for a test to be good (i.e. reliability/validity), it has to yield a worthwhile benefit that outweighs its costs (e.g. money, time, inconvenience, etc)
Is a test with poor reliability and validity likely to have good utility?
Not if the utility includes interpreting the test score, but there may be situations when the test score is less important, such as if its being used for some other purpose (e.g. liar detector tests can be useful even if they don’t work)
What is utility analysis?
A family of different techniques that can be used to decide the usefulness of a test; can also be applied to interventions (e.g. to decide most preferable training/therapy program)
What is an expectancy table, and how do you calculate one?
It’s a utility analysis strategy generated from a criterion validity scatterplot; the scatterplot is divided into categories, based on test performance and job performance criteria, a cut-off (pass mark) is determined, and the number of correct hits, false positives, false negatives and correct negatives are calculated
What are the selection ratios in the context of expectancy tables?
The ratio between available job positions and number of applicants (e.g. 2 positions: 4 applicants = selection ratio of .5)
What are false positives and false negatives in the context of expectancy tables?
False positives: test incorrectly identifies person as being good when they’re not;
False negatives: test incorrectly identifies person as being no good when they’re good
What is a cut off in the context of expectancy tables?
The minimum test score needed in the test to be hired (pass mark)
Imagine you are an HR manager. Describe how you could use expectancy table techniques to deal with (1) a high selection ratio and (2) a low selection ratio
- High selection ratio: if we had lots of job positions, we could lower the pass mark of the test, so no good people missed out
- Low selection ratio: if we only had a few job positions to fill, we could raise the pass mark of the test, so no bad people would be hired
What risks would you be running with either a high or low selection scenario?
High selection: lowering the pass mark could lead to an increase in false positives (more bad people hired);
Low selection: raising the pass mark could lead to an increase in false negatives (good people not hired)
How many categories can an expectancy table have?
Multiple categories rather than just two (good/bad or pass/fail)
What are the disadvantages of using the expectancy table technique?
It assumes a linear relationship between job performance and test score; doesn’t take into account other factors (e.g. minority status, physical health of applicant, etc)
What influence does the criterion validity of a particular selection test have on its usefulness in recruiting people?
It can tell us if we’re choosing the right people; a test with low criterion validity is no better than selecting applicants at random (more scatter); a test with high criterion validity is very good at choosing the right people (positive correlation)
How did Schmidt et al. use utility analysis to save employers millions of dollars?
Through analysis of the Programmer Aptitude Test (PAT) to select computer programmers; when compared with previous selection procedures, it yielded a much higher validity coefficient, saving the employer $6 million p/yr