Writing and Evaluating Test Items Flashcards
Writing test items can be difficult. (DeVellis, (2016) provided several simple
guidelines for item writing
Define clearly what you want to measure
Generate an item pool.
Avoid exceptionally long items.
Keep the level of reading difficulty appropriate for those who will complete the scale
Avoid “double-barreled” items that convey two or more ideas at the same time.
Consider mixing positively and negatively worded items.
when writing items, you need to be sensitive to ethnic and cultural differences.
items on the CES-D concerning appetite, hopefulness, and social interactions may have a different meaning for African American respondents than for white respondents
tests may become obsolete.
Armed Services Vocational Aptitude Battery was studied over a 16-year period. Approximately 12% of the items became less reliable over this time.
Items that retained their reliability were more likely to focus on ____
lost reliability focus on ______
skills, while those that lost reliability focused on more abstract concepts
dichotomous format
two alternatives for each item
advantages of dichotomous
advantages of true-false items include their obvious simplicity, ease of
administration, and quick scoring. Another attractive feature is that the true-false items require absolute judgment. The test taker must declare one of the two
alternatives
make the scoring of the subscales easy. All that a tester needs to do is count the number of items a person endorses from each subscale.
disadvantages of dichotomous
MEMORIZE encourage students to memorize material, making it possible for students to perform well on a test that covers materials they do not really understand.
COMPLEXITY “truth” often comes in shades of gray, and true-false tests do not allow test takers the opportunity to show they understand this complexity.
MANY ITEMS - mere chance of getting any item correct is 50%. Thus, to be reliable, a true-false test must include many items.
LESS RELIABLE/PERCISE Overall, dichotomous items tend to be less reliable, and therefore less precise than some of the other item formats.
polytomous format
each item has more than two alternatives
a point is given for the selection of one of the alternatives, and no point is given for selecting any other choice
advantage of polytomous
major advantage of this format is that it takes little time for test takers to respond to a particular item because they do not have to write. Thus, the test can cover a large amount of information in a relatively short time
issues in the construction and
scoring of multiple-choice tests
how many distractors should a test have? Psychometric theory suggests that adding more distractors should increase the reliability of the items. However, in practice, adding distractors may not actually increase the reliability because it is difficult to find good ones. The reliability of an item is not enhanced by distractors that no one would ever select.
Ineffective distractors actually may hurt the reliability of the test because they are time consuming to read and can limit the number of good items that can be included in a test.
usually best to develop three or four good distractors for each item
Guessing
“correct” answers simply by guessing, a correction for guessing is sometimes
used.
, if a correction for guessing is used, then
random guessing will do you no good. Some speeded tests are scored so that the correction for the guessing formula includes only the items that were attempted— that is, those that were not attempted are not counted either right or wrong. In this case, random guessing and leaving the items blank have the same expected effect.
How about cases where you do not know the right answer but can eliminate one
or two of the alternatives?
advise you to guess
The correction formula assumes that you are equally likely to respond to each of the four categories. For a four-choice item, it would estimate your chance of getting the item correct by chance alone to be 1 in 4. However, if you can eliminate two alternatives, then the chances are actually 1 in 2. This gives you a
slight advantage over the correction formula
students are more likely to guess when …
they anticipate a low grade on a test than when they are more confident
discourage guessing by
giving students partial credit for items left blank
guessing threshold
describes the chances that a low-ability
test taker will obtain each score
Likert format, the category scale, and the Q-sort
do not judge any response as “right”
or “wrong.” Rather, they attempt to quantify the characteristics of the response
essay
commonly used in classroom evaluation, and the Educational Testing Service now uses a writing sample as a component of its testing
programs.
reliability of the scoring procedure should be assessed by determining the association between two scores provided by independent scorers.
In practice, however, the psychometric properties of essay exams are rarely evaluated
Likert format
indicate the degree of agreement with a particular attitudinal question.
some applications, six options are used
to avoid allowing the respondent to be neutral.
Scoring requires that any negatively worded items be reverse scored and the responses are then summed. This format is especially popular in measurements of
attitude
subjected to factor analysis,
test developers can find groups of items that go together
issues with likert
challenged the appropriateness of using traditional parametric statistics to analyze Likert responses because the data are at the ordinal rather than at an interval level
category format.
similar to the Likert format but that uses an even greater number of choices
10-point rating systems