Test Bias and Fairness Flashcards

1
Q

3 aspects/facets of the definition of bias

A
  1. The presence, in scores, of construct-irrelevant variance
    ○ Systematic variation in the scores that is due to construct-irrelevant variables
    ○ Depends on the definition of the target construct
    ○ Not error variance / random variation / measurement error; it’s systematic, and due to construct-irrelevant variances
    1. Varies as a function (f) of group membership
      ○ The amount/extent of construct-irrelevant variance in the scores changes as a function of group membership (it will be higher in some groups and lower in some others)
      § What could be those groups?
      □ Gender, ethnicity, age, other
    2. Scores wind up systematically over- or under-estimating the target construct for a particular group
      The estimation could be high or low - it’s not always under-estimation (it can be over too)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definition of bias (1 sentence)

A

Bias is the presence of construct-irrelevant variance in scores, based on group membership, such that scores wind up over or under estimating the target construct for that particular group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can we determine that a certain factor constitutes bias in a test?

A

If there is one facet of the group membership (age, gender, etc) that influences the test scores BEYOND the true construct of interest, in a systematic fashion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

2 myths about bias

A

It’s only bias if it places a group at a disadvantage (not true - can also put groups at an advantage)
It’s always against minority members (not true - it can be against anyone)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Definition of fairness

A

Whether the outcomes of administering a test in a particular context are described as being equitable and socially just

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Relation between bias and fairness

A

If a test based on the data is judged to be biased, then a biased test would probably be considered unfair (almost always)
BUT, a test without bias could also be judged as being unfair

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Caveat about interpreting/noticing bias

A

Simply observing that members from different groups obtain different average scores on a particular test is not evidence for bias by itself, especially if those groups differ on variables that are relevant for target constructs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Item/content bias

A

refers to if the wording of the items (instructions, items themselves, choices in MC, etc) contains terms that everyone will understand - if not then there is item/content bias
• There will always be people who will not understand, but there are ways to maximize the understanding in our selected population
○ Avoid idioms, slang
○ Try not to use the same slang as a client, since it will not be perceived well (failed attempt to ingratiate yourself with the client and appropriate a culture that is not yours)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Predictive bias

A

does prediction using the test scores vary due to construct irrelevant variables for different groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Construct bias

A

concerns if the test measures the same theoretical domains for members of different groups through the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Method bias

A

Special because the test is involved, but it’s more of an issue in individual testing
• The impact of the examiner on people from different groups may not be the same for many reasons unrelated to the construct measured (ex: reaction to a mental health professional, a doctor, a professor, etc)
• The impact that an authority figure has on the examinees during testing (the stimulus value of the authority figure)
• Some culture might be more reluctant to let strangers handle sensitive matters like health or mental health, therefore might react more strongly to the tester
Some might also be worried about what will happen to their results (if they will be shared with others, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

2 main types of educational assessment

A
  • Classroom assessment - teachers use various assessment strategies to support ongoing teaching and learning, and to report on the achievement of learning
    • External assessment - includes standardized tests and large-scale assessments developed commercially which are used to determine individual levels of achievement in reference to a norm group
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Comparison of fairness with reliability and validity

A

• Fairness is similar to validity/reliability in that it’s not dichotomous, it’s determined by degree

Unlike validity and reliability, fairness is not a technical quality, but it’s affected by technical quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Historical events that impacted the view of fairness

A

Emerged in the 20th century as a result of 2 earlier events
• Interest in the mind
• Compulsory public education made the number of students rise dramatically

Assessment methods at the time allowed for a lot of subjectivity (essays and oral examinations) - which influenced fairness
The first concerns relating to fairness were about reliability
• Standardization was an interesting solution
• Intelligence tests after WWI were highly racist - faced a lot of criticism
• “Culture-free” test failed
• Fair and unbiased were synonyms at the time, but the definition of bias was refined with the advancement of statistics, so they became two separate concepts

End of 20th century
• There was a shift of focus towards validity issues
• Ownership of the consequences of testing was emphasized
Still a debate whether validity is about ethics or measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

3 conditions for fairer educational assessment

A
  • Opportunity to learn: can simply mean exposure to test content or the alignment between curriculum and assessment, can also relate to the availability of learning resources (teachers, tutoring, etc)
    • Constructive environment: one that respectfully encourages students to fully participate through the assessment process - requires the assessment to be perceived as useful and the teachers to be perceived as trustful and competent
    • Evaluative thinking: involved asking questions, identifying assumptions, seeking evidence and considering different explanations (AKA critically evaluating assessment practices) - also includes self-reflection in teachers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Strategies to be used for increased fairness in educational testing

A

• Transparency: students should know how their work will be judged before an assessment begins (clear instructions)
• Opportunity to demonstrate learning: students should have many opportunities to demonstrate their learning (increases reliability) - varied assessment methods can prevent any type of student from being advantaged over another
Balance between care and respect: making sure that learning opportunities are engaging without being superficial, and challenging without being impossible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Construct bias in the context of cultural testing

A

overlap in definitions of the construct across cultures

18
Q

Item bias in the context of inter-cultural testing

A

due to poor translation, ambiguities, low familiarity for certain groups, influence of culture-specific factors on the understanding of an item

19
Q

What is a possible origin to the test bias controversy?

A

The American society strongly believes in equal opportunity for every person
Leads to a resistance to labelling and alternative placement (esp. In education): we want to believe that everyone has the same opportunities and potential for great achievement

20
Q

Intervening variable in a test

A

a psychological process, if it is treated only as a component of a system and has no properties beyond the ones that operationally define it

21
Q

Hypothetical construct

A

if it is thought to exist and have properties beyond its defining ones
Intelligence and personality are hypothetical constructs

Measurements of hypothetical constructs require lengthy processes of test development, and the scores obtained through testing can be interpreted in different ways; it’s not the same thing as absolute measurement (like height or weight) that contain very little uncertainty

22
Q

How can we find bias in a test?

A

Finding bias: is done through observing the mean difference in scores between groups
• But those differences can be due to many other factors than bias

23
Q

Steelean effects

A

the fear of confirming a stereotype impedes minorities’ performance

24
Q

7 possible sources of bias

A
  1. Inappropriate content
    ○ Tests are constructed according to majority values
    ○ Correct responses depend on material tha is unfamiliar to minority individuals
    1. Inappropriate standardization samples
      ○ Minorities representation in norming samples is proportionate but insufficient to allow them any influence over test development
    2. Examiners’ and language bias
      ○ Minorities can be intimidated by White examiners who speak standard English
    3. Inequitable social consequences
      ○ Ethnic minority individuals are already disadvantages because of stereotyping and past discrimination, labelling effects of testing adds to their discrimination
    4. Measurement of different constructs
      ○ Tests based on majority culture measure different characteristics for minority groups
    5. Differential Predictive Validity
      ○ Tests for majority don’t predict any relevant behaviours for minority members
    6. Qualitatively distinct aptitude and personality
      Suggests that minority and majority ethnic groups possess characteristics of different types, therefore test development must begin with different definitions for majority/minority groups
25
Q

Difference between bias and fairness

A

Fairness: moral, philosophical or legal issue on which reasonable people can disagree
Bias: empirical property of a test (statistically estimated)
• To calculate bias, we need scores
Test reviewers therefore can’t estimate bias

26
Q

Cultural loading

A

(often associated with culture fairness) the degree to which a test or item is specific to a particular culture (greater potential for bias when administered to another group)
Does not render tests biased or offensive by themselves - but creates a potential for them

27
Q

Cultural test bias hypothesis (CTBH)

A

differences in mean performance for members of different ethnic groups do not reflect real differences among groups but are artifacts of tests or of the measurement process that depend on the test content and do not reflect accurately internal abilities
For ex: intelligence tests containing material more familiar to high-SES children than to low-SES children will indicate higher scores for high-SES children

Most studies have refuted this hypothesis - item biases account for <1% to 5% of variation in test scores and is usually counterbalanced across groups

28
Q

Influence of the presence of a certain group in the norming samples on that group’s capacity to score higher on the test

A

He observed that the low presence of minorities in norming samples made it impossible for those groups to have any influence on the results of a test
• Experiment with rats (from different strains, aka ethnicities): he hypothesized that mazes (intelligence tests) that were normed on populations dominated by a certain strain would yield higher scores for that strain - hypothesis was confirmed
○ BUT: other authors have observed that some groups (like Asians), despite being underrepresented in norming samples, score consistently higher than other groups - therefore there must be a cultural component to the difference observed in scores (argument backed up by empirical studies)
○ Harrington’s results were an overgeneralization - rats and humans are not the same in terms of intelligence

29
Q

3 fallacious assumptions that impede the scientific study of test bias

A
  1. The egalitarian fallacy
    If all groups are equal in the characteristics measured by a test, assuming that any difference in score must result from bias is a fallacy
    1. Culture-bound fallacy
      Reviewers can assess the culture loadings of items through casual inspection
    2. Standardization fallacy
      A test is necessarily biased when used with any group not included in large numbers in the norming sample
30
Q

Why is assuming no difference just as detrimental to research than assuming a difference

A

Assuming no difference is just as detrimental to research than assuming a difference
Ex: Black children have been found to have higher creativity than White children in some contexts, but assuming that those differences are due to test bias is detrimental to the access of Black children to special education classes

31
Q

4 limitations of mean differences

A

• Mean differences do not always tell us about the typical score
○ After re-evaluating mean differences in scores, a researcher found that the groups differed more in terms of variation than in their typical score
• Mean differences provide no information as to why the two groups differ - it’s a starting point to asking that question
• The mean is an inaccurate measure of center (if the distribution is skewed, the mean can be pulled away from the center)
○ Symmetry should never be assumed, even with big norming samples
A mean is a point estimate: the variation in scores for one group can be so big that the mean is no longer representative

32
Q

Bias in predictive validity

A

constant error in an inference to an error in a prediction that exceeds the smallest feasible random error, as a function of membership in a particular group
Results in an erroneous regression equation for the group in which there is bias, therefore in a wrong prediction

33
Q

Bias in construct validity

how to detect it

A
• How to detect it:
	○ Factor analysis (most often exploratory) - when it shows evidence that there is more than 1 factor underlying the scores, then it's an indication of construct bias
		§ Researchers calculate a correlation (Pearson) called factor congruence/invariance
34
Q

Situational bias

A

influences in the test situation, but independent of the test itself, that may bias test scores
Ex: characteristics of test setting, instructions, examiners, etc
They are not test bias because they’re not part of the test themselves, but they should still be considered

35
Q

Content validity

how to detect content validity bias

A

the extent to which the content of a test is a representative sample of the behaviour to be measured
• Items with content bias will behave differently from group to group for people with the same level of abiltiy
• Ex: the item has information that is unfamiliar to minority individuals
• There has been shown to be differences in some items of the SATs and GREs for African American and White examinees
• How to detect item bias:
○ DIF
○ CT (contingency table)
○ Partial correlation procedure to estimate DIF
○ Chi-square technique
○ Correlations between P decrements
○ Partial correlation between an item score and a nominal variable (ex: ethnic group)

36
Q

Internal consistency reliability

A

the extent to which all items of a test are measuring the same construct
If the reliabilities are similar from group to group, the test is said to be unbiased

37
Q

Helm’s 7 forms of cultural equivalence

A
  1. Functional equivalence: the extent to which test scores have the same meaning for different cultural groups
    1. Conceptual equivalence: whether test items have the same meaning and familiarity in different groups
    2. Linguistic equivalence: whether tests have the same linguistic meaning to different groups
    3. Psychometric equivalence: the extent to which tests measure the same thing for different groups
    4. Testing condition equivalence: whether groups are equally familiar with testing procedures and view testing as a means of accessing abiltiy
    5. Contextual equivalence: the extent to which a cognitive ability is assessed similarly in different contexts in which people behave
      Sampling equivalence: whether comparable samples of each cultural group are available at the test development, validation, and interpretation stages
38
Q

3 reasons to translate/adapt tests

A
  • Facilitate comparative ethnic studies
    • Allow individuals to be tested in their own language
    • Reduce the time and cost of developing new tests
39
Q

When are tests from different languages/cultures considered equivalent?

A

• When members of each linguistic/cultural group who have the same standing on the construct measured by the tests have the same probability of selecting the correct item response

40
Q

Judgmental designs used to establish item equivalence

A

• Judgmental design: rely on a person’s / group’s decision regarding the degree of translation equivalence of an item
○ Forward translation: translators adapt or translate a test to the target culture/language, other translators then assess the equivalency between the 2 versions
○ Back translation: translators adapt or translate a test to the target culture/language, other translators readapt the items back to the original culture or language - both versions must fit

41
Q

Statistical designs used to establish item equivalence

A

• Statistical designs: depend on the characteristics of the sample
○ Bilingual examinees: bilingual individuals take both the original and translated test
○ Source and target language monolinguals design: monolinguals in the original language take the original or the back-translated version
○ Monolinguals design: monolinguals in the original language take the original and the back-translated version
Afterwards, statistical procedures are done to assess DIF (factor analysis, item response theory, logistic regression, Mantel-Haenszel technique)

42
Q

4 guidelines to ensure equitable assessment

A
  1. Investigate possible referral source bias, because evidence suggests that people are not always referred for services on impartial, objective grounds
    1. Inspect test developer’s data for evidence that sound statistical analyses for bias have been completed
    2. Conduct assessments with the most reliable measure available
    3. Assess multiple abilities and use multiple methods