19 - Test Bias Flashcards
African Americans score about how many points lower than white Americans on standardized IQ tests?
15 points - one SD
If you were to administer the Stanford-Binet or the Wechsler scale to large random samples of African Americans and white Americans, it is likely you would get the same results
what is the dispute then?
Why are there differences?
Dispute has not been over whether these differences exist but over why they do
differences result from environmental factors?biological? and related to the general (g) factor measured by IQ tests?
Increasing number of people no longer report their race when asked. Why?
What does this affect?
4% of the test takers did not disclose their ethnicity
Another 4% did not find a standard category
As a result, it is difficult to determine why the performance gap is not narrowing
African American students, because of stereotype threat, perform more poorly on tests when they reveal their race
White students decline to report their race because they feel there is discrimination in favor of ethnic minorities
differential validity
Some psychologists argue that the tests are differentially valid for African Americans and whites.
Does differences btw ethic groups test performance indicate test bias?
Differences among ethnic groups on test performance do not necessarily indicate
test bias.
The question is whether the test has different meanings for different groups. - validity defines the meaning of a test
African American and white employees
on subjective measures such as supervisor ratings versus objective measures based
on more formal evaluations
objective measures showed even larger differences between African American and white employees than the subjective evaluations for measures of work quality, quantity, and absenteeism. Differences between Hispanic and white employees were not as large as those between African American and white employees
Content-Related Evidence for Validity
Test constructors and users were accused of being biased because some children never have the opportunity to learn about some of the items; furthermore, members of ethnic groups might answer some items differently but still correctly.
argued that scores on intelligence tests are affected by language skills inculcated as part of a white, middle-class upbringing but foreign
to inner-city children
Flaugher (1978) concluded that many perceived test bias problems are based on misunderstandings about the way tests are usually interpreted
Flaugher on Content-Related Evidence for Validity
Flaugher argued that the purpose of aptitude and achievement tests is to
measure performance on items sampled from a wide range of information
Not particularly concerned about individual items, test developers focus on test performance, making judgments about it based on correlations between the tests and external criteria.
Content Related Evidence for Validity
Many test critics, though, focus attention on specific items
Owen (1985) reported that several intelligent and well-educated people had difficulty with
specific items on the SAT and Law School Admission Test (LSAT) examinations.
Some items on standardized tests are familiar only to those with a middle-class background.
Test developers are indifferent to people’s opportunities to learn the test information. Again, the meaning they eventually assign to the tests come from correlations of test scores with other variables
some evidence suggests that the linguistic bias in standardized tests does not cause the observed differences
Quay (1971) administered the Stanford-Binet test to 100 children in an innercity Head Start program.
Half of the children took a version of the test that used African American dialect, while the others took the standard version
less than a 1-point increase in test scores
African American children can comprehend standard English about as well as they can comprehend African American dialect
Some studies have failed to demonstrate that biased items in well-known standardized tests account for the differences in scores among ethnic groups
many attempts to “purify” tests using this approach have not eliminated differences between groups.
In one study, 16% of the items in an elementary reading test were eliminated after experts reviewed them and labeled them as potentially biased toward the majority group.
However, when the “purged” version of the test was used, the differences between the majority and the minority school populations were no smaller than they had been originally
Another approach to the same problem is to find those classes of items that are most likely to be missed by members of a particular minority group
important; if they identify certain types of items that discriminate among groups, then these types of items can be avoided on future tests
results have not been encouraging; studies have not clearly identified such categories of items
Differential Item Functioning (DIF) Analysis
developed by the Educational Testing Service (ETS)
Creates and administers a variety of aptitude tests, including the Graduate Record Examination (GRE), the SAT, and the LSAT
Performance of white test takers differs significantly from the performances of other racial and ethnic groups on verbal and analysis measures
attempts to identify items that are specifically biased against any ethnic, racial, or gender group
DIF analysis steps
- equates groups on the basis of overall score. find subgroups of test takers who obtain
equivalent scores. - evaluates differences in performance between men and women on particular items.
- . Items that differ significantly between the groups are thrown out and the entire test is rescored.
some evidence that test items that depict people do not accurately portray the distribution of genders and races in the population
white male characterization occurred with disproportionate frequency.