11 Flashcards
item analysis
developers evaluate the performance of each test item
developers evaluate the performance of each test item
item analysis
quantitative item analysis
statistical analyses of the responses test takers gave to individual items
statistical analyses of the responses test takers gave to individual items
quantitative item analysis
item difficulty
the percentage of test takers who respond correctly
dividing the number of persons who answered correctly by the total number of persons who responded to the question
the percentage of test takers who respond correctly
item difficulty
dividing the number of persons who answered correctly by the total number of persons who responded to the question
items with difficulty levels or p values of __ yield distributions of test scores with the most variation
.5
this is the aim number
discard or rewrite items with extreme p values __ to __ and __ to __
0, .2; .9, 1
discrimination index
compares the performance of those who obtained very high test scores (upper group [U]) with the performance of those who obtained very low test scores (lower group [L]) on each item
A statistic that compares the performance of those who made very high test scores with the performance of those who made very low test scores on each item.
discrimination index
interitem correlation matrix
displays the correlation of each item with every other item
displays the correlation of each item with every other item
interitem correlation matrix
phi coeffiecients
results of correlating two dichotomous variables
results of correlating two dichotomous variables
phi coeffiecients
empirically based tests
test designed so that test scores can be used to sort individuals into two or more categories based on their scores on the criterion measure
test designed so that test scores can be used to sort individuals into two or more categories based on their scores on the criterion measure
empirically based tests
subtle questions
questions that have no apparent relation to the criterion
questions that have no apparent relation to the criterion
subtle questions
item response theory (IRT)
provide estimates of the ability of test takers that is independent of the difficulty of the items presented as well as estimates of item difficulty and discrimination that r are independent of the ability of the test taker
provide estimates of the ability of test takers that is independent of the difficulty of the items presented as well as estimates of item difficulty and discrimination that r are independent of the ability of the test taker
item response theory (IRT)
item characteristic curves (ICCs)
the lien that results when we graph the probability of answering an item correctly with the level of ability on the construct being measured
the higher the ability level the more difficult the question
the lien that results when we graph the probability of answering an item correctly with the level of ability on the construct being measured
item charcteristic curves (ICCs)
item bias
when a n item is easier for one group than for another group
when a n item is easier for one group than for another group
item bias
acculturation
the degree to which an immigrant or minority member has adapted to a country’s mainstream culture
qualitative test analysis
When test developers ask test takers to complete a questionnaire about how they viewed the test and how they answered the questions.
When test developers ask test takers to complete a questionnaire about how they viewed the test and how they answered the questions.
qualitative test analysis
first part of the validation process
establishing evidence of validity based on test content
generalizable
the test can e expected to produce similar results even though it has been administered in different locations
the test can e expected to produce similar results even though it has been administered in different locations
generalizable
replication
Administration of a test to a second, different, sample of test takers representative of the target audience as part of the test validation process.
Administration of a test to a second, different, sample of test takers representative of the target audience as part of the test validation process.
replicaiton
cross-validation
Administering a test another time following a validation study to confirm the results of the validation study; because of chance factors that contribute to random error, this second administration can be expected to yield lower correlations with criterion measures.
Administering a test another time following a validation study to confirm the results of the validation study; because of chance factors that contribute to random error, this second administration can be expected to yield lower correlations with criterion measures.
cross-validation
regression equation
Y = a + bX
measurement bias
f
f
measurement bias
differential validity
a test yields significantly different validity coefficients for subgroups
a test yields significantly different validity coefficients for subgroups
differential validity
single-group validity
a test is valid for one group but not for another group
a test is valid for one group but not for another group
single-group validity
predictive bias
occurs when the predictions made about a criterion score based on a test score are different for subsets of test takers (e.g., minority vs. majority, men vs. women)
occurs when the predictions made about a criterion score based on a test score are different for subsets of test takers (e.g., minority vs. majority, men vs. women)
predictive bias
slope bias
the slopes of the separate regression lines that relate the predictor to the criterion are not the same for one group as another
the slopes of the separate regression lines that relate the predictor to the criterion are not the same for one group as another
slope bias
accessibility
pertains to the opportunity tests takers have to demonstrate their standing on the construct(s) the test is designed to measure.
pertains to the opportunity tests takers have to demonstrate their standing on the construct(s) the test is designed to measure.
accessibility
universal design
tests should be constructed from the outset in such a way that accessibility is maximized for all individuals who may take the test in the future.
tests should be constructed from the outset in such a way that accessibility is maximized for all individuals who may take the test in the future.
universal design
cut scores
decision points for dividing test scores into pass/fail groupings
decision points for dividing test scores into pass/fail groupings
cut scores
subgroup norms
statistics that describe subgroups of the target audience
statistics that describe subgroups of the target audience
subgroup norms
two approaches to setting cut scores
1) panel of experimental judges that decide the minimal number a qualified person can get right
2) uses the correlation between the test and an outside criterion to predict the test score that person who performs at a minimum level of acceptability is likely to make