Chapter 11 - Assessing Psychometric Quality Flashcards

1
Q

Item Analysis

A
  • How developers evaluate the performance of each test item.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Quantitative Item Analysis

A
  • Statistical analyses of the responses test takers gave to individual items.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Item Difficulty

A
  • The percentage of test takers who respond correctly.
  • We calculate each item’s difficulty or p-value by dividing the number of persons who answered correctly by the total number of persons who responded to the questions.
  • We get this information for the pilot test.
  • They disregard questions that are too hard or too easy.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Discrimination Index

A
  • Compares the performance of those who obtained very high test scores with the performance of those who obtained very low test scores.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Item-Total Correlation

A
  • Another way to assess the ability of individual test items to discriminate high-scoring individuals from lower-scoring ones.
  • This is a measure of the strength and direction of the relationship between the way test takers responded to one item and the way they responded to all of the items as a whole.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Interitem Correlation Matrix

A
  • Displays the correlation of each item with every other item.
  • Usually, each item has been coded as a dichotomous variable - correct 1 or incorrect 0.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Phi Coefficients

A
  • Result of correct two dichotomous variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Empirically Based Tests

A
  • Tests designed so that test scores can be used to sort individuals into two or more categories based on their scores on the criterion measure.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Subtle Questions

A
  • Questions that have no apparent relation to the criterion.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Item Response Theory (IRT)

A
  • This theory relates the performance of each item to a statistical estimate of the test taker’s ability on the construct being measured.
  • A measure of the relationship between an individual’s performance on one test item and the test takers’ levels of performance on the overall measure of the construct the test is measuring.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Item Characteristic Curves (ICCs)

A
  • The line that results when we graph the probability of answering an item correctly with the level of ability on the construct being measured.
  • The ICC provides a picture of the item’s difficulty and how well it discriminates high performers from lower performers.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Computerized Adaptive Testing (CAT)

A
  • All test takers start with the same small set of questions.
  • As the test progresses, the computer software chooses and presents each test taker with harder or easier questions depending on how well the test taker answered previous questions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Item Bias

A
  • When an item is easier for one group than for another group.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Acculturation

A
  • The degree to which an immigrant or a minority member has adapted to a country’s mainstream culture.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Qualitative Item Analysis

A
  • Non-statistical means of evaluating qualitative data which normally refers to the analysis of text.
  • Used when qualitative analysis procedures are used when test developers ask test takers for verbal or written feedback about test questions.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Construct Bias

A
  • Arises when items do not have the same meaning from culture to culture or subculture.
16
Q

Method Bias

A
  • Arises when the mechanics of the test work differently for various cultural groups.
17
Q

Differential Item Functioning

A
  • Arises when test takers from different cultures have the same ability level on the test construct, but the item or test yields very different scores for the two cultures.
18
Q

Why is revision important in test development?

A
  • They use different kinds of analysis in order to pick the best questions to fulfill their goal.
19
Q

How are the final items chosen?

A
  • Choosing the items that make up the final test requires the test developer to weigh each item’s evidence of validity, item difficulty and discrimination, interitem correlation, and bias.
20
Q

What is the first part of the validation process?

A
  • Establishing evidence of validity based on test content.
  • Carried out as the test is developed.
21
Q

Generalizable

A
  • Meaning the test can be expected to produce similar results even though it has been administered in different locations.
22
Q

Replication

A
  • The process of replication involves a final round of test administration to another sample of test takers representative of the target audience.
23
Q

Cross-Validation

A
  • This process breaks the original sample used in the original validation study into two parts.
  • This can be done without having to administer the test to a second group of test takers.
24
Q

Measurement Bias

A
  • When the scores on a test taken by different subgroups in the population need to be interpreted differently because of some characteristic of the test not related to the construct being measured.
25
Q

Predictive Bias

A
  • Occurs when the predictions made about a criterion score based on a test score are different for subsets of test-takers.
26
Q

Differential Validity

A
  • When a test yields significantly different validity coefficients for subgroups.
27
Q

Single-Group Validity

A
  • Test is valid for one group but not for another group.
28
Q

Slope Bias

A
  • It occurs when the slopes of the separate regression lines that relate to the predictor to the criterion are not the same for one group as another.
29
Q

Why is test fairness important?

A
  • Psych tests are used to compare individuals, their purpose is to identify or illuminate the differences among individuals.
  • The results of differences should be based on the trait or characteristic being measured.
30
Q

Accessibility

A
  • Pertains to the opportunity test takers have to demonstrate their standing on the constructs the test is designed to measure.
31
Q

Universal Design

A
  • The idea behind universal design is that tests should be constructed from the outset in such a way that accessibility is maximized for all individuals who may take the test in the future
32
Q

Cut Scores

A
  • Decision points for dividing test scores into pass-fail groupings.
33
Q

What is the purpose of test norms?

A
  • To provide a reference point or structure for understanding one person’s score.
34
Q

Subgroup Norms

A
  • Statistics that describe subgroups of the target audience.
35
Q
A