Writing And Evaluating Test Items Flashcards
6 guidelines for item writing
Define clearly what you want to measure
Generate an item pool
Avoid long items
Reading difficulty appropriate for those who will complete scale
Avoid double barreled items that convey two or more ideas
Can setup mixing positively and negatively worded items
6 item formats
Dichotomous Polychotomous Likert Category Checklist Q sorts
Dichotomous format pros and cons
Positives Easy to Mark Absolute judgement Simplicity ease of administration quick scoring Disadvantages Encourage memorization No room for Shades of Grey Can get 50% by chance
Polychotomous format pros and cons
example: multiple choice
More than two alternatives
Positives
Ease of scoring quick to do decrease is the probability of guess ING
Distrators important
Three alternative multiple choice items as valid and reliabe as five alternative multiple choice
if distracors too easy reliability and validity decreases
Guessing possible for a for choice item 25% probability of getting correct by chance
Likert format pros and cons
Positives
Allows researchers to determine how much people endorse items
Can use Factor analysis and find groups of items that go together
Familiar and easy to use
Negative
Ordinal level of data shouldn’t use parametric statistics to analyse it
Category format pro and cons
10 point rating system can have more or less than 10 points
Negative
People change ratings depending on context
Positive
Clearly labelled options increase reliability and validity.
Category format pros and cons
10 point rating system can have more or less than 10 points
Negative
People change ratings depending on context
Positive
Clearl
Checklists definition
forced choice on lists of items
q Sort explanation
many cards related to criterion Sort cards into piles forced choice pile one not at all like me pile ten very like me
Item Analysis (3 Methods)
Item Difficulty
Discriminability
Item Characteristics
Item Difficulty
definition
formula
guideline
percentage of people who get item correct converted to a decimal
optimum difficulty = (100% - chance performance level) divide by two.
many tests need a variety of difficulty
most tests: 0.30 - .70
Item Discriminability definition and methods
have people who have done well on item done well in whole test
Extreme Group
Point Biserial correlation
Extreme Group Method of Item Discriminability
compares people who have done well with people who did poorly.
the proportion of each group who got each item correct
difference between proportions in the discrimination index/
Item Discriminability Point Biserial Method
Correlation between performance on item and performance on whole test
dichotomous x continuous variable correlation
Pictures of item characteristcs
item characteristic curve
good curve increases as a function of the total test score.
poor curve is flat
A curve that dips at the end has a fault with the alternation designated correct. Good students know it isn’t completely correct.
Item Response theory advantages
provides information on item functioning, value of specific items. reliability of scale
test takers scores in defined by the level of difficulty of item they can get correct.
can easily adapt for computer administration
means test takers don’t spend time on items that are too easy or too hard
reduces bias against slow test takers
identifies test takers with unusual response patterns
cope with items in different formats
Criterion Referenced test definition
compares performance with a clearly defined criterion
Constructing a criterion referenced test
specify objectives to learn
develop test items
give to students who should meet the objective (done the learning) and one who shouldn’t.
Frequency polygon should be a v.
the bottom of the v is the antimode (least frequent score)
The antimode is the cut score for meeting the objective.
Limitations of item Analysis
They enable test constructor to separate students but don’t help students learn.
Don’t always identify weaknesses or knowledge gaps.
Example of using Q Sort from Carl Rogers evaluating self-concept
method
meaning of large discrepency
person recieves set of cards with appropriate statements. person sorts cards from most to least descriptive .
first sort: who they are
second sort: ideal self
large discrepencies reflect poor adjustment and low self-esteem
Information given on items in item response theory
provides information on item functioning, value of specific items. reliability of scale
Identifies unusual response patterns
Steps to test construction
Cougers do dudes during Dawn
Conduct review
Describe use and interpretation
Decide who and why
Need for measures of dissimulation
Measures of dissimulation
Mmpi
Social desirability
Fake bad
Response set
Test manual needs:
Normals do outrageous orgasms in london
Norms, sampling Development of construction, items, norms, scoring Reliability Validity Special Out groups Parts of test