Wk 10 -Bias Flashcards

Question 1

Q

Schmidt et al. (1979) used utility analysis to evaluate the efficacy of a test (Programmer Aptitude Test) for selecting programmers over traditional non-test methods such as interview. The test led to an estimated saving of $6 million per year. What was the key factor behind this saving? (x2)

Answer

A

Test had much better psychometric validity than the non-test methods -
(i.e. it was much better at selecting out the best people for the job than previous options).

Question 2

Q

One group of people scores higher on a test designed to predict job performance than another group of people. Overall, the test is found to be valid. On a scatterplot (job performance versus test score), the two groups are best modelled using two separate but parallel regression lines. What does this mean? (x1)

Answer

A

The test is biased but can still predict job performance within each group separately.

Question 3

Q

True or false, and why? (x2)
In the case “Australian Industrial Relations Commission vs. Coms21” (1999), the court held that Coms21 had terminated the employment of five individuals unfairly because, IN ADDITION TO the usual competency tests, the terminations were also based on personality profiles.

Answer

A

False
Coms21 ONLY used personality tests and
One of the criticisms levelled at them by the court was that they failed to use competency tests

Question 4

Q

True or false?

The current national Australian body dealing with unfair workplace practices is the Fair Work Commission.

Question 5

Q

Imagine I used a psychological test to decide which of 30 applicants to hire for 15 potential job positions. The test has high criterion-referenced validity with respect to job performance. If I set a very high cut-off for the test to minimise false positives then what would be the most likely outcome? (x1 plus explain x2)

Answer

A

Unlikely to be able to select enough people to fill all the job positions
High cut-off for a high validity test will result in selecting the best people -
But I risk not ending up with enough of them (because only a tiny number of people would pass the test and hence be eligible)

Question 6

Q

Research on the “Pygmalion effect” (Rosenthal & Jacobsen, 1966) found that if teachers were told that certain students were likely to do well academically (when in reality these students were identified at random) then the selected students showed greater IQ score gains: (x1)

Answer

A

Only in the younger classes.

Question 7

Q

True or false, and why? (x2)
In US court cases, intelligence tests, such as the WISC and Stanford-Binet, have been CONSISTENTLY judged to be racially biased when used in educational settings.

Answer

A

False
Because US courts have been inconsistent in rulings regarding use of IQ testing in education
In the example in the lecture, two courts in different parts of the US, upheld opposite verdicts at virtually
the same time

Question 8

Q

True or false?
In Australian employment law, it is possible to overcome a claim of discrimination on the basis of a disability, if you can demonstrate that the disability is directly relevant to some inherent requirement of the job.

Question 9

Q

Statistical decision analysis is: (x1 plus explain x3)

Answer

A

A method that can be used to evaluate the usefulness of a test
EG, in selecting employees Note that decision analysis and utility analysis are the same thing:
“A number of researchers have demonstrated that the use of utility analysis (statistical decision analysis) can save employers substantial amounts of money”

Question 10

Q

True or false, and why? (x2)
When using aptitude and personality tests for employee recruitment in Australia, the tests need to have content validity with regards to the requirements of the job

Answer

A

True
…and not the person in the abstract
(i.e. content validity as well as criterion validity needed)

Question 11

Q

True or false, and why? (x2)

In principle, it is possible for a test to have good utility even if it does not have decent reliability and validity

Answer

A

True
Might happen when a test is being used for a purpose other than generating a meaningful test score
Eg lie detector tests might be useful for putting pressure on individuals regardless of whether the tests actually work or not

Question 12

Q

One group of people scores higher on a test designed to predict job performance than another group of people. Overall the test was found to be valid. On a scatterplot (job performance versus test score), the two groups are best modelled with a single regression line. What does this mean? (x1)

Answer

A

Either the test is not biased or the test score and job performance measure are both biased

Question 13

Q

Imagine I used a psychological test to decide which of 30 applicants to hire for 15 potential job positions. If the test had a trivially small (but positive) criterion validity coefficient with respect to job performance, what would be the most likely outcome? (x1 plus explain x2)

Answer

A

I could obtain my 15 applicants but it would be unlikely that I had selected the best people for the job
If validity coefficient is close to zero, then would be effectively selecting applicants at chance. Hence it would be unlikely I would end up selecting the best people for the job

Question 14

Q

If we discover that one group of people score higher on a test than another group of people, what are the possible underlying reasons for this? (x2)

Answer

A

The test is biased, or

There are actual systematic differences between certain groups

Question 15

Q

If our test is biased then what are our options for dealing with this? (x4 plus eg/issue for each)

Answer

A

Have an accommodation within existing tests (e.g. extra time, points correction) - difficulties in measuring fairness
Redevelop the test - but you might be throwing out useful items…
Develop an alternative special test for the group - eg that’s language-appropriate
Avoid testing altogether (but are the alternatives likely to be less biased?) - eg teacher judgement over objective test

Question 16

Q

Describe how we can use regression lines to model test bias (x4)
What psychometric property are we actually evaluating here? (x1)

Answer

A

Create scatterplot:
Test scores on the x-axis
Criterion scores on the y-axis (eg RL job performance)
Find correlation/line of best fit
Criterion validity

Question 17

Q

What scenario would be described by a criterion validity scatterplot with a single straight regression line, where one group (B) are all at the bottom left, and another (A) at the top right, of the line? (x3)
What are the implications? (x2)

Answer

A

The test is distinguishing between two groups:
A doing well on both test scores, and performance
B doing worse on both
Either the test isn’t biased or
Both the job performance measure and the test score are both equivalently biased.

Question 18

Q

Some people have argued that there are race differences in intelligence test scores and that these are due to genetics – what are the arguments for this idea? (x2)

Answer

A

The Bell Curve - no differences due to interventions, so must be purely genes
Lynn, 1994 - because black children raised in white families with white education tend to still do worse at school and score lower on IQ tests, this is evidence of innate race differences.

Question 19

Q

What is the Pygmalion Effect? (x1)
Describe the experiment that Rosenthal and Jacobson carried out to demonstrate it (x5)
What does this demonstrate? (x2)

Answer

A

When teacher expectations affect pupil’s IQ gains
Kids given a non-verbal IQ test.
Teachers told the test was predictor of “blooming”/“spurting”.
Then given list of children who performed in top 20% (actually just random)
For earliest grades, “bloomers” scored significantly higher at the end of the year
(no effect from third years onward)
Purely environmental, not genetic, effect on IQ
But, failures to replicate reported

Question 20

Q

Describe the experiment that Steele and Aronson carried out to demonstrate the effect of self-stereotyping (x7)

Answer

A

Students completed the Graduate Record Examination
Group 1 told test measured intellectual ability
Group 2 told test was about problem-solving
Random assignment to either group
African-Americans did worse than White Americans when in Group 1 but not in Group 2 (where they did not differ).
So discrimination entirely at the level of the individual, not the test or experimenter

Question 21

Q

Describe the experiment that Shih et al. carried out to demonstrate the effect of self-stereotyping (x8)

Answer

A

Gave Asian American women a maths test (“quantitative knowledge”)
Before giving them the test, randomly assigned to groups
One primed them with ideas of ethnic identity
Second with questions on gender
Third group was control
Those in racial identity prime conformed to Californian stereotype that Asians are better at maths
Gender priming worsened scores due to local stereotypes that women are worse at math
Canadian replication, different stereotypes, failed to replicate effects

Question 22

Q

Discuss the issues that have arisen when the use of psychological testing has been taken to court on the grounds of bias (x3)
Plus one eg (x2)

Answer

A

Aptitude tests and personality tests must be designed so they relate directly to the genuine inherent requirements of the - not person in the abstract
(i.e. content validity as well as criterion validity needed)
Australian Industrial Relations Commission vs. Coms21 (1999) -
Coms21 lost because fired staff over personality testing in the absence of skill or competency tests

Question 23

Q

What is wrong with selecting people for a job based on an interview or CV?

Answer

A

They have low criterion validity
ie don’t predict actual performance
So you get big spread on your scatterplot
And your selection tool is no better than selecting at random

Question 24

Q

What psychometric property is an expectancy table analysing?

Answer

A

Criterion validity

Question 25

Q

Some people have argued that there are race differences in intelligence test scores and that these are due to genetics – what are the arguments against this idea? (x6)

Answer

A

Construct of race is primarily social - no biological meaning, especially in societies like the US (Freeman & Payne, 2000)
Not as simple as (eg Lynn) pointed out –
Minority group member may still face many disadvantages in such a situation, especially where group membership may be superficially obvious
Eg black kids in predominantly white schools is not a blind experiment - everyone can see their skin, may = differential treatment
School achievement and even IQ is partly a function of teacher expectation (e.g. the Pygmalion effect, see next slide).

Question 26

Q

What is utility analysis? (x1)
What questions does it answer? (x2)
What other application does it have (x1 plus e.g.)

Answer

A

Family of different techniques used decide the usefulness of a test.
To answer questions like:
• Which test shall we use?
• Is adding a new test to an existing battery worthwhile?
Can also be applied to interventions (training, therapies) –
e.g. which training program is most preferable?

Question 27

Q

Imagine you are an HR manager – describe how you could use expectancy table techniques to deal with a high selection ratio (x2)
What risks would you be running ? (x2)

Answer

A

You would lower the pass mark/cut-off (move the vertical line to the left on x-axis)
To enable filling of lots of positions
No good people would miss out, but more inappropriate ones would be hired
ie more false positives on the pass-side, as well as more correct hits

Question 28

Q

Imagine you are an HR manager – describe how you could use expectancy table techniques to deal with a low selection ratio (x2)
What risks would you be running ? (x2)

Answer

A

You would raise the pass mark/cut-off (move the vertical line to the right on the x-axis)
To reflect need to select less people from more applicants
Much lower chance of hiring bad people (less false positives)
Get more false negatives (high performers, with insufficient test scores), but doesn’t matter to employer

Question 29

Q

What are the disadvantages of using the expectancy table technique? (x3)

Answer

A

Assumes a linear relationship between job performance and test score.
Doesn’t take into account other factors, e.g. minority status, physical health of applicant, etc.
You wouldn’t use expectancy tables in real job, as is too simplistic

Question 30

Q

Give an example of how utility analysis has been used to save employers millions of dollars (no need to give figures – just explain the concepts) (x8)

Answer

A

Analysis of the Programmer Aptitude Test (PAT) used in selecting computer programmers
Supervisors rated $ value of poor, average, and good programmers
Employer (US govt) hired 600 new programmers annually, 4000 on the books
Average programmer retention = about 10 yrs
New test validity coefficient = .76.
$10 per applicant to administer the test
Compared with previous selection procedures (non-test procedures that had validities of between 0 and .5) -
New test saved $6 million/year (in 1979)

Question 31

Q

How might norm-referenced testing set up conflicts that we need to address? (x3)

Answer

A

The whole point is to discriminate between groups -
To measure differences on a particular dimension
Opens up potential for bias

Question 32

Q

Give three examples of groups tests might be biased against

Answer

A

People with disabilities (e.g. vision, hearing, motor, cognitive disabilities)
Gender groups
Racial groups

Question 33

Q

Give an example of dealing with literacy/language bias in RL testing (x5)

Answer

A

Qld Transport - driving tests
Complex written instructions would filter this out, but not reflect driving ability
Made a video with simple language and tested it on ESL people
Also tested their English reading skill and understanding of instructions,
In order to make sure literacy wasn’t the driver of low scores

Question 34

Q

What are the Black Intelligence Test for Cultural Homogeneity and The Chitling Test? (x1)
Which were developed based on arguments that… (x1)
But face problems of…

Answer

A

Alternative IQ tests specifically designed for African Americans
Others designed by white middle class men to make them look cleverer than other groups
Psychometric validity -
Do get larger scores from target groups,
But no predictive validity demonstrated with RL outcomes (job performance, academic achievement)

Question 35

Q

What scenario would be described by a criterion validity scatterplot with two straight parallel regression lines, where one group (B) are all at the left, and another (A) at the right? (x2)
What are the implications? (x4)

Answer

A

The slope of the regression lines of the groups are the same,
But they intercept the vertical axis at different places - intercept bias
Test is biased against Group B -
But bias is theoretically correctable as the test is still equally predictive of job performance for both groups
(but the groups can’t be compared).

Question 36

Q

What is intercept bias? (x2)

What does it tell us? (x2)

Answer

A

When two regression lines are found in a scatterplot,
That intercept the vertical axis at different points
That our test is biased against one group over another -
It’s predicting differences in the same manner, but the two groups can’t be compared

Question 37

Q

What scenario would be described by a criterion validity scatterplot with two straight regression lines, where one group’s (B) is shallower, and would intersect with another (A) found further along the x-axis (ie not parallel, different intercepts)? (x2)
What are the implications? (x4)

Answer

A

Test is biased against Group B -
And also less predictive of performance for them
Test is differentially valid for the 2 groups.
Slope of the regression lines of the groups are different - slope bias
The nightmare scenario…
• B are now all over the place…
• A still gives high correlation, test is good predictor of job performance
• But for group B, test is rubbish predictor
• So if we took group B on its own, test is lacking validity

Question 38

Q

What is slope bias? (x1)

What does it tell us? (x1)

Answer

A

When the slopes of regression lines are different for different groups
That the test is differentially valid for different groups

Question 39

Q

What do we mean by ‘differentially valid’? (x1)

Answer

A

That a relationship between test scores and performance is different for different groups under the same conditions

Question 40

Q

What pieces of legislation impact the use of psychological test for employment purposes? (x6)

Answer

A

Racial Discrimination Act 1975
Age Discrimination Act 2004 
Human Rights and Equal Opportunity Commission Act 1986.
Sex Discrimination Act 1984 
Disability Discrimination Act 1992
Fair Work Act 2009

Question 41

Q

The Disability Discrimination Act (Australia) defines disability broadly, to include… (x8)
How might an allegation of discrimination be overcome? (x1)

Answer

A

Physical
Intellectual
Psychiatric
Sensory
Neurological, and
Learning disabilities, as well as
Physical disfigurement, and
The presence in the body of disease-causing organisms..
• Eg in Mumbai, giving employees HIV tests, sacking those positive – never get away with that in Oz
Must demonstrate that the deficit is directly tied to an inherent requirement for the job

Question 42

Q

Give two US egs of successful legal action over biased intelligence testing (x3 and x2)

Answer

A

Diana vs. State Board of Education (1970) - overrepresentation of mentally-retarded children with Spanish surnames - changed when children tested in Spanish.
Larry P. v. Wilson Riles (1979) - “use of IQ tests which had a disproportionate effect on Black children violated [a number of laws] when used to place children in EMR [educable mentally retarded] classes”.

Question 43

Q

Describe the legal action (and result) over allegations of racial bias in WISC-R and Stanford-Binet tests (x1 and x1)

Answer

A

Parents in Action on Special Education v. Hannon (1979) -

Judge ruled that evidence that tests were racially biased was “unconvincing”

Question 44

Q

What bodies look/ed after discrimination cases in Australia? (x2)

Answer

A

AIRC: Australian Industrial Relationships Commission (up to 2009)
Currently, it is the Fair Work Commission (Australia’s national workplace tribunal)

Question 45

Q

What are the implications of various court decisions on discriminatory testing? (x4)

Answer

A

Inconsistencies in court decisions are commonplace (Kaplan & Saccuzzo 2001 p.597).
On completing this course, I’ll probably be able to evaluate the evidence better than most judges.
Who aren’t trained in making decisions on reliability, validity, test bias etc – people have argued that this is a problem…
“The validity and reliability of many psychological tests are currently being decided by the courts” Reitan, 1994

Question 46

Q

What is test utility? (x2)

Answer

A

The practical usefulness of a test (financial cost/benefit, savings in time, etc).
Does it lead to better decisions about something?

Question 47

Q

What is the relationship between test utility, validity and reliability? (x3)

Answer

A

Test may have fantastic reliability and validity but still poor utility (e.g. it costs $1,000,000 or takes 24 hrs to administer)
But unlikely to have utility (assuming utility includes interpreting test score) if crap reliability/validity
But may retain if purpose is other than score comparison (eg lie detectors - potential use in Ps thinking they work)

Question 48

Q

What is an expectancy table?

Answer

A

Simple utility analysis strategy that involves converting the criterion validity scatterplot of a test into a table

Question 49

Q

What four things do we need to know when working out an expectancy table?

Answer

A

False positive: test incorrectly identifies person as being good when they’re not – the test is wrong
False negative: test incorrectly identifies person as being no good when they’re good – the test is wrong
Selection ratio: the ratio between available job positions and number of applicants (2 positions: 4 applicants = selection ratio of .5)
Cut off: The minimum test score needed in the test to be hired (pass mark)

Question 50

Q

What four categories do we get when dividing up a scatterplot for an expectancy table?
Which allow us to… (x1)

Answer

A

Gives:
• Correct negatives – test says they’ll be rubbished, and it turns out they are
• False negatives – test says they’ll be rubbished, but their performance was high
• Correct hits – test says they’ll be great, turns out they are
• False positives – test says they’ll be good, but crappy performance
Work it all out to give a 2 x 2 contingency table

Question 51

Q

What is the general procedure for an expectancy table? (x6)

Answer

A

Choose your selection procedures (tests, interviews) to maximise criterion-related validity.
Note that an expectancy table can have multiple categories rather than just two (good/bad or pass/fail).
Choose your cut off (pass mark) to get the selection ratio (% of applicants hired) that you want -
Draw that onto the scatterplot of scores (vertical)
Then decide a cutoff above which we define those above as good at job, those below as rubbish (horizontal)
Can get more complex, with multiple categories, eg different levels of good, or grading groups rather than pass/fail

Question 52

Q

What is statistical decision analysis? (x1 plus eg x2)

What is it’s purpose? (x1)

Answer

A

Same as utility analysis - for analysing the usefulness of tests
Eg in selecting and placing personnel
Eg medical tests to screen for various diseases and education Research demonstrates that it can save employers buckets of cash

Question 53

Q

What is involved in more sophisticated utility analysis procedures (beyond expectancy tables…)? (x1)
And how can they be problematic (x1 plus eg x1)

Answer

A

Often analysis of financial cost/benefits
Involve complex statistics and equations and the estimation of some of the variables required can be difficult (e.g. need to estimate the dollar value of good and bad employees)