Test Bias and Fairness Flashcards
3 aspects/facets of the definition of bias
- The presence, in scores, of construct-irrelevant variance
○ Systematic variation in the scores that is due to construct-irrelevant variables
○ Depends on the definition of the target construct
○ Not error variance / random variation / measurement error; it’s systematic, and due to construct-irrelevant variances- Varies as a function (f) of group membership
○ The amount/extent of construct-irrelevant variance in the scores changes as a function of group membership (it will be higher in some groups and lower in some others)
§ What could be those groups?
□ Gender, ethnicity, age, other - Scores wind up systematically over- or under-estimating the target construct for a particular group
The estimation could be high or low - it’s not always under-estimation (it can be over too)
- Varies as a function (f) of group membership
Definition of bias (1 sentence)
Bias is the presence of construct-irrelevant variance in scores, based on group membership, such that scores wind up over or under estimating the target construct for that particular group
How can we determine that a certain factor constitutes bias in a test?
If there is one facet of the group membership (age, gender, etc) that influences the test scores BEYOND the true construct of interest, in a systematic fashion
2 myths about bias
It’s only bias if it places a group at a disadvantage (not true - can also put groups at an advantage)
It’s always against minority members (not true - it can be against anyone)
Definition of fairness
Whether the outcomes of administering a test in a particular context are described as being equitable and socially just
Relation between bias and fairness
If a test based on the data is judged to be biased, then a biased test would probably be considered unfair (almost always)
BUT, a test without bias could also be judged as being unfair
Caveat about interpreting/noticing bias
Simply observing that members from different groups obtain different average scores on a particular test is not evidence for bias by itself, especially if those groups differ on variables that are relevant for target constructs
Item/content bias
refers to if the wording of the items (instructions, items themselves, choices in MC, etc) contains terms that everyone will understand - if not then there is item/content bias
• There will always be people who will not understand, but there are ways to maximize the understanding in our selected population
○ Avoid idioms, slang
○ Try not to use the same slang as a client, since it will not be perceived well (failed attempt to ingratiate yourself with the client and appropriate a culture that is not yours)
Predictive bias
does prediction using the test scores vary due to construct irrelevant variables for different groups
Construct bias
concerns if the test measures the same theoretical domains for members of different groups through the test
Method bias
Special because the test is involved, but it’s more of an issue in individual testing
• The impact of the examiner on people from different groups may not be the same for many reasons unrelated to the construct measured (ex: reaction to a mental health professional, a doctor, a professor, etc)
• The impact that an authority figure has on the examinees during testing (the stimulus value of the authority figure)
• Some culture might be more reluctant to let strangers handle sensitive matters like health or mental health, therefore might react more strongly to the tester
Some might also be worried about what will happen to their results (if they will be shared with others, etc)
2 main types of educational assessment
- Classroom assessment - teachers use various assessment strategies to support ongoing teaching and learning, and to report on the achievement of learning
- External assessment - includes standardized tests and large-scale assessments developed commercially which are used to determine individual levels of achievement in reference to a norm group
Comparison of fairness with reliability and validity
• Fairness is similar to validity/reliability in that it’s not dichotomous, it’s determined by degree
Unlike validity and reliability, fairness is not a technical quality, but it’s affected by technical quality
Historical events that impacted the view of fairness
Emerged in the 20th century as a result of 2 earlier events
• Interest in the mind
• Compulsory public education made the number of students rise dramatically
Assessment methods at the time allowed for a lot of subjectivity (essays and oral examinations) - which influenced fairness
The first concerns relating to fairness were about reliability
• Standardization was an interesting solution
• Intelligence tests after WWI were highly racist - faced a lot of criticism
• “Culture-free” test failed
• Fair and unbiased were synonyms at the time, but the definition of bias was refined with the advancement of statistics, so they became two separate concepts
End of 20th century
• There was a shift of focus towards validity issues
• Ownership of the consequences of testing was emphasized
Still a debate whether validity is about ethics or measurement
3 conditions for fairer educational assessment
- Opportunity to learn: can simply mean exposure to test content or the alignment between curriculum and assessment, can also relate to the availability of learning resources (teachers, tutoring, etc)
- Constructive environment: one that respectfully encourages students to fully participate through the assessment process - requires the assessment to be perceived as useful and the teachers to be perceived as trustful and competent
- Evaluative thinking: involved asking questions, identifying assumptions, seeking evidence and considering different explanations (AKA critically evaluating assessment practices) - also includes self-reflection in teachers
Strategies to be used for increased fairness in educational testing
• Transparency: students should know how their work will be judged before an assessment begins (clear instructions)
• Opportunity to demonstrate learning: students should have many opportunities to demonstrate their learning (increases reliability) - varied assessment methods can prevent any type of student from being advantaged over another
Balance between care and respect: making sure that learning opportunities are engaging without being superficial, and challenging without being impossible