Week 4 Flashcards

Test Bias vs Test Fairness

1
Q

Whats Difference between Test Bias and Test Fairness?

A

Test Bias = Statistical Concept
Test Fairness = Social Values Concept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Technical Meaning of Test Bias

A

An empirical question that can ONLY be examined by test validation studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Differential Validity

A

Differences in relationship between a test and a criterion when administered to different groups of people.

i.e., vocab test on non-eng vs eng speakers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Test Bias in Content Validity + when it occurs

A

Most common criticism against tests

Occurs when:
- Test items asks for information that certain groups didnt have chance to learn

  • Certain groups are penalised for answers that are correct in their culture.
  • Wording is unfamiliar/difficult for groups.

Can only be demonstrated through empirical research (No panel of experts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Differential Item Functioning Analysis

A

Attempts to identify items that are biased against any groups.

Identifies subgroups and compares performance on items.
If differ significantly they they are thrown out and rescored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Test Bias in Criterion-Related Validity

A

An unbiased test will equally predict future performance for all subgroups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What should the regression line be for an unbiassed test

A

Both groups on the slops of the regression line (groups may overall have differences in average score for criterion but need to be on same line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What should regression look like for intercept bias

A

Slops are the same but are parallel as y int is different

Using one line discriminates in favour of one group over another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What should regression look like for slope bias

A

Slope (Gradient) different. not parallel.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is test bias in construct validity

A

When test measures the same same trait/construct but with different degrees of accuracy between groups.

Demonstrated when a test is shown to measure traits/constructs different for one group versus another group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to see if there is no test bias in construct validity

A

When the test is shown to have the factor structure for the two groups.
If you rank the item difficulties within the test are highly similar.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the three philosophies of Test Fairness

A

Unqualified Individualism
Qualified Individualism
Quotas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does Unqualified Individualism refer to

A

Selection decisions are made on the basis of BEST QUALIFIED applicants, however, if AGE, GENDER and OTHER DEMOGRAPHIC CHARACTERISITCS are found to be a VALID PREDICTOR of PERFORMANCE, these variables should be considered.

Direct opposite of Qualified Individualism

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are some issues with Unqualified Individualism

A

More emphasis could be placed on the demographic characteristics rather than test scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does Qualified Individualism refer to?

A

The process of selecting the best qualified individual based SOLEY ON TEST ABILITIES.

Other demographic characteristics are not considered.

Opposite of Unqualified Individualism

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some issues with Qualified Individualism?

A

Selection based exclusively on test abilities without considering any variables that could affect the performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What does Quotas refer to?

A

Using separate selection procedures for the various subgroups in the community

For example,
If in a particular location the population is comprised of 35% Indigenous Australian and 65% non-Indigenous Australians, then selection procedures will be used to select candidates in approximately the same ratio. One selection procedure will be used to select the best available indigenous Australian applicants while another procedure is used to select the best available non-indigenous Australians.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the issue with Quotas

A

Those selected do not necessarily have the highest test scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the 5 Major Steps in test development

A
  1. Test Conceptualisation
  2. Test Construction
  3. Test Try-out
  4. Item Analysis
  5. Test Revision
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is required in Test Conceptualisation?

A

Requires clear specification of the construct to be measured and what is known about it?

21
Q

What is statement of purpose in Test Conceptualisation

A

Statement of Purpose
- Begins with a statement of purpose of the test and construct or construct domain to be measured.
- Simple and identifies the trait/s to be measured and the target audience for the test

22
Q

What is the Literature Check stage for Test Conceptualisation?

A

Checking whether an appropriate test already exists for that purpose.

23
Q

What is the preliminary design issues stage of Test Conceptualisation?

A

Test developers need to make some preliminary decisions about the design of the test:

  • Mode of administration
  • Length of the test
  • Number of scores
  • Response format
  • Score reports
  • Administrator training
  • Norm-referenced or criterion-referenced.
24
Q

What is an Operational Definition:

A

A specification of the observable characteristics that will be measured and the process for assigning a value to the concept.

Example 1:
Intention to quit smoking
- Individual’s rating of the probability that they will stop smoking on a scale from 1 = very unlikely to 5 very likely

Example 2:
Construct of “intelligence” as measured by the WAIS-IV
- “the capacity of the individual to act purposefully, to think rationally, and to deal effectively with his environment”

  • “Intelligence is a global construct (g) which can also be categorized by the sum of many specific abilities” ‐‐ Wechsler, 1944
  • Comprised of subtests with specific administration and scoring protocols
25
What comprises a test item?
1. Stimulus 2. Response Format - Conditions governing responses 3. Scoring Procedures
26
What is the Stimulus/Item Stem
The introductory statement or question that sets the context for a specific question or problem. - Can also be a picture (TAT) or apparatus (coloured blocks) with instructions for their use.
27
What is the Response Format for an Item?
The ways in which respondents can answer questions, which can be structured or unstructured. Includes factors as whether the item on the test will be multiple-choice/SAQ etc.
28
What are the Conditions Governing Response?
Any factors that could influence responses. Factors such as: - Whether there is a time limit for responding - Whether the administrator can probe ambiguous responses - Exactly how the responses will be recorded (e.g., answer sheet, test booklet)
29
What is the Scoring Procedure?
How an item is scored. The scoring procedure for each item is clearly specified and easily understood.
30
What are the 2 types of Test Items?
Selected-Response Format Or Constructed-Response Format
31
What is the Selected-Response Format
A test structure where test-takers are to select a response from a set of alternative [given] responses. Include: Multiple-Choice Items, Rating Scales (Likert, Comparative, Guttman)
32
What are the 3 elements of Multiple-Choice Items
1. Stem 2. Correct Alternative 3. Distractors or Foils
33
What are Rating Scales and name 3 forms of it
Rating scales use a system of ordered numerical, verbal and/or pictorial descriptors judgements. Likert, Comparative, Guttman.
34
What are Comparative Scales?
In the comparative scaling approach judgements are made using either sentences, printed cards, drawings, photographs or objects. Is an ordinal or rank order scale that can also be referred to as a non-metric scale. Respondents evaluate two or more objects at one time and objects are directly compared with one another as part of the measuring process. For example you could ask someone if they prefer listening to MP3s through a Zune or an iPod. You could take it a step further and add some other MP3 player brands to the comparison. - Do you prefer A or B or C
35
What is the Guttman Scales and purpose?
The intent of this survey is that the respondent will agree to a point and their score is measured to the point where they stop agreeing. For this reason questions are often formatted in dichotomous yes or no responses. The survey may start out with a question that is easy to agree with and then get increasingly sensitive to the point where the respondent starts to disagree. You may start out with a question that asks if you like music at which point you mark yes. Four questions later it may ask if you like music without a soul and which is produced by shady record labels only out to make money at which point you may say no. A series of increasingly extreme one-dimensional questions to see where the test-taker stops Endorsing them. E.g., 1. Do you want immigrants in your Country? 2. In your community? 3. Your neighbourhood? 4. Next door? 5. Live with an immigrant?
36
What is the Constructed-Response Format?
A test structure where test-takers are to create or construct a response, not merely select it.
37
What are 3 forms of Constructed-Response Format?
Fill-in-the-Blank SAQ Essay Questions (LAQ) Projective Tests use CRF e.g., TATs, Rorscach.
38
How do you score selected-response items?
Straight-Forward Numbers are assigned to diff responses and final score is by summing the numbers across all items (Summative Score)
39
What are problems with scoring constructed-response items?
Challenging because responses are diverse. Requires considerable judgement - Time Consuming - Exxy - Susceptible to low inter-rater reliability.
40
What is Qualitative Evaluation in Test Construction?
Once items have been written they are often subject to review from several perspectives prior to the formal test tryout. 1. Conformity with various item-writing rules 2. Content Relevance 3. Sensitivity Review 4. Informal Tryouts
41
What is the "Double-Barrell" error in item construction?
An item that asks about two separate topics or issues within a single question, but only allows for one response.
42
What are the general guidelines for Item Construction (MCQ)
1. Use Plausible Distractors 2. Use Question Format over incomplete sentences 3. Balance/randomise placement of correct answer across items. 4. Make options are mutually exclusive w only ONE correct answer. 5. Only use MCQ if there are no other appropriate formats.
43
What are other general considerations for Item Construction?
Reading and Comprehension lvls of test-taker/s Impact or influence of Ethnic/Cultural factors Use the simplest possible language and give clear instructions.
44
What are 3 methods to analyse Content Relevance?
Expert Panel Review Sensitivity Review Informal Tryout
45
What is an Expert Panel Review
A group of experts may be called on to review items for content relevance or correctness. - For example, if the construct is “anxiety” a group of clinical psychologists might be called in to review test items. - In recent years, more emphasis has also been placed on involving end-users in the review of items.
46
What is a sensitivity review?
A review of all items for possible gender, racial, or ethnic bias. - Illustration: the developers of the Stanford Achievement Test employed an advisory panel of 12 minority group members. - The panel identified several potential forms of content bias that might find its way into achievement tests: 1. Status 2. Stereotyping 3. Familiarity 4. Offensive choice of words
47
What is the purpose of an Informal Tryout?
An informal tryout is often used to ensure that test items are “working” as intended. - Test-takers completing informal tryouts asked to comment on the items and test directions. - Test-takers may be asked to “think aloud” while answering the items. Informal tryouts help the test developer to identify ambiguous wording, unexpected interpretations of an item, confusion about methods for responding and so on.
48
What is involved in the Formal Test Tryout step?
1. Administration of the item pool to a representative sample of test-takers. 2. Conducted under identical conditions under which the standardised test will be administered. - Identical Time limits, Instructions, environment etc. 3. Administered to a large sample size - Generally several hundred when using Classical Item Analysis Procedures