Wk 10 -Bias Flashcards
Schmidt et al. (1979) used utility analysis to evaluate the efficacy of a test (Programmer Aptitude Test) for selecting programmers over traditional non-test methods such as interview. The test led to an estimated saving of $6 million per year. What was the key factor behind this saving? (x2)
Test had much better psychometric validity than the non-test methods -
(i.e. it was much better at selecting out the best people for the job than previous options).
One group of people scores higher on a test designed to predict job performance than another group of people. Overall, the test is found to be valid. On a scatterplot (job performance versus test score), the two groups are best modelled using two separate but parallel regression lines. What does this mean? (x1)
The test is biased but can still predict job performance within each group separately.
True or false, and why? (x2)
In the case “Australian Industrial Relations Commission vs. Coms21” (1999), the court held that Coms21 had terminated the employment of five individuals unfairly because, IN ADDITION TO the usual competency tests, the terminations were also based on personality profiles.
False
Coms21 ONLY used personality tests and
One of the criticisms levelled at them by the court was that they failed to use competency tests
True or false?
The current national Australian body dealing with unfair workplace practices is the Fair Work Commission.
True
Imagine I used a psychological test to decide which of 30 applicants to hire for 15 potential job positions. The test has high criterion-referenced validity with respect to job performance. If I set a very high cut-off for the test to minimise false positives then what would be the most likely outcome? (x1 plus explain x2)
Unlikely to be able to select enough people to fill all the job positions
High cut-off for a high validity test will result in selecting the best people -
But I risk not ending up with enough of them (because only a tiny number of people would pass the test and hence be eligible)
Research on the “Pygmalion effect” (Rosenthal & Jacobsen, 1966) found that if teachers were told that certain students were likely to do well academically (when in reality these students were identified at random) then the selected students showed greater IQ score gains: (x1)
Only in the younger classes.
True or false, and why? (x2)
In US court cases, intelligence tests, such as the WISC and Stanford-Binet, have been CONSISTENTLY judged to be racially biased when used in educational settings.
False
Because US courts have been inconsistent in rulings regarding use of IQ testing in education
In the example in the lecture, two courts in different parts of the US, upheld opposite verdicts at virtually
the same time
True or false?
In Australian employment law, it is possible to overcome a claim of discrimination on the basis of a disability, if you can demonstrate that the disability is directly relevant to some inherent requirement of the job.
True
Statistical decision analysis is: (x1 plus explain x3)
A method that can be used to evaluate the usefulness of a test
EG, in selecting employees Note that decision analysis and utility analysis are the same thing:
“A number of researchers have demonstrated that the use of utility analysis (statistical decision analysis) can save employers substantial amounts of money”
True or false, and why? (x2)
When using aptitude and personality tests for employee recruitment in Australia, the tests need to have content validity with regards to the requirements of the job
True
…and not the person in the abstract
(i.e. content validity as well as criterion validity needed)
True or false, and why? (x2)
In principle, it is possible for a test to have good utility even if it does not have decent reliability and validity
True
Might happen when a test is being used for a purpose other than generating a meaningful test score
Eg lie detector tests might be useful for putting pressure on individuals regardless of whether the tests actually work or not
One group of people scores higher on a test designed to predict job performance than another group of people. Overall the test was found to be valid. On a scatterplot (job performance versus test score), the two groups are best modelled with a single regression line. What does this mean? (x1)
Either the test is not biased or the test score and job performance measure are both biased
Imagine I used a psychological test to decide which of 30 applicants to hire for 15 potential job positions. If the test had a trivially small (but positive) criterion validity coefficient with respect to job performance, what would be the most likely outcome? (x1 plus explain x2)
I could obtain my 15 applicants but it would be unlikely that I had selected the best people for the job
If validity coefficient is close to zero, then would be effectively selecting applicants at chance. Hence it would be unlikely I would end up selecting the best people for the job
If we discover that one group of people score higher on a test than another group of people, what are the possible underlying reasons for this? (x2)
The test is biased, or
There are actual systematic differences between certain groups
If our test is biased then what are our options for dealing with this? (x4 plus eg/issue for each)
Have an accommodation within existing tests (e.g. extra time, points correction) - difficulties in measuring fairness
Redevelop the test - but you might be throwing out useful items…
Develop an alternative special test for the group - eg that’s language-appropriate
Avoid testing altogether (but are the alternatives likely to be less biased?) - eg teacher judgement over objective test
Describe how we can use regression lines to model test bias (x4)
What psychometric property are we actually evaluating here? (x1)
Create scatterplot: Test scores on the x-axis Criterion scores on the y-axis (eg RL job performance) Find correlation/line of best fit Criterion validity
What scenario would be described by a criterion validity scatterplot with a single straight regression line, where one group (B) are all at the bottom left, and another (A) at the top right, of the line? (x3)
What are the implications? (x2)
The test is distinguishing between two groups:
A doing well on both test scores, and performance
B doing worse on both
Either the test isn’t biased or
Both the job performance measure and the test score are both equivalently biased.
Some people have argued that there are race differences in intelligence test scores and that these are due to genetics – what are the arguments for this idea? (x2)
The Bell Curve - no differences due to interventions, so must be purely genes
Lynn, 1994 - because black children raised in white families with white education tend to still do worse at school and score lower on IQ tests, this is evidence of innate race differences.
What is the Pygmalion Effect? (x1)
Describe the experiment that Rosenthal and Jacobson carried out to demonstrate it (x5)
What does this demonstrate? (x2)
When teacher expectations affect pupil’s IQ gains
Kids given a non-verbal IQ test.
Teachers told the test was predictor of “blooming”/“spurting”.
Then given list of children who performed in top 20% (actually just random)
For earliest grades, “bloomers” scored significantly higher at the end of the year
(no effect from third years onward)
Purely environmental, not genetic, effect on IQ
But, failures to replicate reported
Describe the experiment that Steele and Aronson carried out to demonstrate the effect of self-stereotyping (x7)
Students completed the Graduate Record Examination
Group 1 told test measured intellectual ability
Group 2 told test was about problem-solving
Random assignment to either group
African-Americans did worse than White Americans when in Group 1 but not in Group 2 (where they did not differ).
So discrimination entirely at the level of the individual, not the test or experimenter
Describe the experiment that Shih et al. carried out to demonstrate the effect of self-stereotyping (x8)
Gave Asian American women a maths test (“quantitative knowledge”)
Before giving them the test, randomly assigned to groups
One primed them with ideas of ethnic identity
Second with questions on gender
Third group was control
Those in racial identity prime conformed to Californian stereotype that Asians are better at maths
Gender priming worsened scores due to local stereotypes that women are worse at math
Canadian replication, different stereotypes, failed to replicate effects