Test 2: Chapters 6, 7, and 9 Flashcards
What three things must you consider before you begin writing test items?
- Type of test (multiple choice, essay, etc.)
- The responses you want (The type of test you write depends on the responses you want)
- The objectives of the test
List the test item writing techniques that are recommended in the ch. 6 outline.
- Define clearly what you want to measure
- read a great deal of theory to know what you want to measure, and write questions as specifically as possible
- Generate an item pool
- Write 3 or 4 similar items then select the one that is the most specific and captures the measure
- Make sure the reading level is appropriate
- Mix positively and negatively worded items “acquiescence response set”
- Always take into account cultural, racial, ethnic, and gender differences
- Constantly reevaluate tests because they can lose reliability and validity over time.
What are things that you should avoid when writing test items?
- Avoid redundancy
- Generating an item pool prevents this
- Avoid lengthy items
- These can be both confusing and misleading
- Avoid “double-barreled” items
* Do not put two or more questions in the same item
What are the 6 different types of item formats?
- The dichotomous format
- The polytomous format
- The likert format
- The category format
- Checklists
- Q-Sorts
What is the dichotomous format?
Is it used often?
2 alternatives for each item (true/false)
*Not used as often due to its limitations
What are pros and cons of the dichotomous format?
Pros:
- easy to construct
- easy to administer
- easy to score
- easy to paraphrase lines out of a textbook or lecture notes
Cons:
- require absolute judgement (only one answer is right) even though sometimes both alternatives may have truth to them
- 50% chance of being correct but this is not mastery
- encourages memorization vs true material comprehension
What is the polytomous format?
*Is it used often?
More than two alternatives; only one alternative is correct (multiple choice tests)
*very popular in the educational setting
What are pros and cons of the polytomous format
Pros:
-easy to score
-less likely than dichotomous item to be correct due chance
-can cover large amt. of material in short time
-correct answers are less likely to be the result of chance (ex. 25% chance rather than 50% chance with dichotomous format)
Good distractors (incorrect choices) can increase the reliability of a test
Cons:
Ineffective distractors decrease reliability and validity
When is it okay to guess if a correction for guessing formula is used?
When you have narrowed your responses to two choices.
*Tests that use correction for guessing formulas dock off an extra point for incorrect answers. Therefore, a question that is left blank loses less points than a question that is answered incorrectly.
How would you assess reliability and validity of an essay exam?
- inter-rater agreement: lets test maker know if their questions are subjective
- Correlate the essay with other tests for validity
* Note: When writing essay questions, have clear instructions, clearly express grading criteria, and ask a peer to review your questions
Describe the Likert format.
*This format is typically used in what types of scales?
- Test-taker evaluates agreement or disagreement
- Five alternatives: 1. strongly disagree 2. disagree 3. Neutral 4. agree 5. strongly agree
*This format is often used for personality and attitude scales.
What is one pro and one criticism of the likert format?
Pro: very easy to subject to factor analysis (find item groups that cluster together)
Criticism: Some believe that parametric statistics should not be used for this format because the data are ordinal and not at the interval level.
Describe the category format. Provide an example.
The category format usually involves 10 point rating scales
-1 is usually low and 10 high
-don’t have to use 1-10.
Ex. on a scale of 1 to 10, rate your level of pain.
Ratings change depending on _________.
Context.
ex. I would get a 1 if compared to Michael Jordan and a 5 if compared to Cici in basketball.
How can you improve discriminability in tests that use the category format?
- as a test administrator, give people an idea of what 1 means and what 10 means (ex. show them a film)
- as a test taker, be more invested in what you are rating
What is the visual analogue scale? Is it used very often?
In Visual Analogue Scales, test-takers are asked to place a mark on a line to rate something.
It is NOT used very often
What is a checklist? What are the cons of the checklist format? Is this format used very often?
Checklist: people select adjectives from a list.
Cons:
- people may define adjectives differently
-sometimes people are different based on context
-usually have only two adjectives to choose from (ex. brave vs afraid, shy vs outgoing)
*adjective checklists are popular in personality measurement, but checklists are falling out of favor
What is a Q-sort? What do scored items look like on a graph? Which item responses are of interest to test administrators?
- With Q-sorts, test-takers are given statements, and are instructed to place each statement in 1 of 9 piles.
- Items look like a bell-shaped curve; most items fall in the middle categories (4 and 5)
- Test administrators are interested in item responses that fall in the extreme categories (1 & 2, 8 & 9).
What is a primary difference between checklists and Q-sorts?
Q-sorts discriminate; checklists do not discriminate (Q-sorts increase the number of categories)
What is item analysis?
How you evaluate your items (ex. item difficulty, discriminability, etc.
Item difficulty depends on what two factors?
- The use of the test
2. The types of items
What is considered the optimal difficulty for a test item?
Should this level of difficulty apply to all test items? Why or why not?
.625 (62.5% correctly answer the item)
No. To increase the validity (those who study and comprehend the meaning and those who do not), test items should be discriminated at different levels.
A difficulty range between ___ and ___ discriminates between students.
.30 and .70 (.30 =harder items, .70 =easier items)
The statements, “If I want to raise self-esteem, I may make an easier test.” and “If I am selecting medical school candidates, then the items are going to be harder.” are examples of __________ (which need to be considered when deciding item difficulty.)
Human Factors
What effect will adding a few easier items have on a test?
A few easier item may help test anxiety, and increase reliability and validity.
*This is especially true when the easier items are placed at the beginning of the test.
What would a test maker look at when determining the discriminability of test items?
They would look at the relationship between the test item and whole test