Writing and Evaluating Test Items Flashcards
What’s the most common used method in psychological testing?
Questionnaires
Questionnaire characteristics (3)
(1) Written series of questions
(2) Structured stimuli (i.e. questions)
(3) Structured responses (i.e. response format)
Questionnaires Advantages (4)
(1) Presentation of stimuli is well controlled
(2) Scoring highly reliable
(3) Efficient to administer to large numbers
(4) Inexpensive
Example of Questionnaires with Dichotomous Choice Formats
True-False and Agree-Disagree
Dichotomous formats are often seen in ______ tests
Personality, e.g. MMPI
Compared to other formats, the dichotomours format is ________
less reliable
Rating Scales Formats def
Category Format. Ask responder to evaluate/rate something along a defined continuum.
What’s the problem with rating scales formats?
Number of points? How many options?
More options = more variability
At what point is it too much?
Middle point? Acknowledges that pple might NOT have an opinion, but can be an easy way out
Often 10, between 4 and 7 it’s good.
How many points are optimal for a likert scale?
If likert scale, 7 = optimal
Forced Choice Formats
Person is presented with 2 to 4 stimuli and asked to choose among them.
Q-sort Formats
Forced distribution of items into categories
E.g. E.g. Give person list of 100 characteristics. Group the characteristics according to how like the person the characteristics are:
Four Steps in the Question-Answer Process
(1) Comprehension: Attending to questions and instructions
(2) Retrieval: Retrieval of relevant information
(3) Judgment: Integration of retrieved information
(4) Response: Mapping the judgment on the response category
Issues for Questionnaires (3)
RESPONSE SET: Tendency for people to respond to questions in a way that paints a certain picture of themselves instead of providing honest answers
(1) Acquiescence = Tendency to agree, say true, say often.
(2) Social desirability = Tendency to present self in a socially favorable manner
(3) Random responding = Ignoring or paying insufficient attention to item content
Errors in completing a questionnaire = ____ (def)
Formally valid answers that do NOT reflect true scores, undermining data quality
How can we combat Acquiescence bias?
Use reverse-score items
What are the 2 components of Social desirability?
Impression management & Self deception
How can we combat social desirability? (3)
(1) Measure influence: assess discriminant validity
-> Discriminant validity evaluates whether a test measures what it’s supposed to (its intended construct) and not something else (like social desirability).
-> If your measure overlaps too much with social desirability, it indicates the test may be influenced by this bias rather than the true construct.
(2) Marlowe-Crowne social desirability scale
(3) Change response format (forced choice; Q-sort)
Random responding occurs when respondents ____ (3)
(1) Do not read an item
(2) Do not understand an item
(3) Are unmotivated to think about the item
How can we detect random responding? (4)
(1) INSTRUCTED response items: Ask for a specific answer (e.g. “choose strongly disagree”)
(2) BOGUS items: Ask about impossible or improbable scenarios (e.g. “I was born before 1920”)
(3) SELF-REPORT items: Ask participants about their care and engagement DURING the survey
(4) RESPONSE TIME: Computed after data collection but must be considered before starting
Writing Good items (12)
- Single idea per item stem
- Write each item in a clear and direct manner
- Avoid long items
- Avoid double negatives
- Reading level appropriate for intended test-takers
- Avoid slang or colloquial language
- Make all items independent
- Ask someone else to review items to reduce ambiguity and inaccuracies
- Make all responses similar in length and detail
- Make sure the item has only one best answer
- Avoid words such as “always” and “never”
- Avoid overlapping responses