Wk 6 - Test construction Flashcards
The item discrimination index, where the Upper and Lower Groups are defined by overall test score, can tell us how that single item contributes to a test’s… (x1)
Why? (x1)
Internal consistency
Question is describing the Item Discrimination Index for reliability - which tells you the extent to which the item is yielding a similar response to the other items in the scale.
An examination has an average item difficulty index of .95. Which of the following is most likely given this information (x1)
Why? (2)
The examination is very easy.
Item difficulty index tells you what percentage of test-takers got a question correct.
In this e.g., on average, 95% of test-takers were getting the questions correct
The following frequency data refers to a question on a four option multiple-choice examination; the options are denoted (i), (ii), (iii), (iv) below. Students are divided into an Upper Group (top third of class based on overall exam score) and a Lower Group (bottom third of class based on overall exam score).
Upper Group: (i) 12, (ii) 35, (iii) 0, (iv) 3.
Lower Group: (i) 23, (ii) 21, (iii) 0, (iv) 6.
For example, this data indicates that 12 people from the Upper Group chose Option (i). What is the item discrimination index for this item if the correct answer is Option (ii)?
And how calculated?
.28
Because:
U = 35
L = 21
nU = 50 (summing across the top row: 12 + 35 + 0 + 3)
nL = 50 (summing across the bottom row: 23 + 21 + 0 + 6)
d = (35/50) - (21/50) = .28
True or false, and why? (x2)
On a norm-referenced aptitude test, the ‘optimal difficulty’ of an item is defined as a point halfway between chance and everyone getting the answer WRONG.
False
Optimal difficulty is actually halfway between chance and everyone getting the answer CORRECT
That’s why the optimal difficulty calculation requires you to add the chance of guessing to 100% before dividing by 2 to find the halfway point.
True or false, and why? (x1)
The item discrimination index for validity tells us the extent to which the item contributes to the scale’s correlation with a relevant criterion measure.
True
Higher the item discrimination index for validity, the more that item is contributing to the scale’s relationship with the external criterion measure in question.
What is the optimal item difficulty index for two option multiple-choice questions in a norm-referenced achievement test?
How calculated? (x2)
.75
Chance for a 2 option multiple choice is 50%. (50% + 100%)/2 = 75%.
True or false?
The item discrimination index for reliability tells you the extent to which people are responding to that item in the same way as they are responding to the other items in the scale.
True
True or false, and why? (x2)
If the item discrimination index for reliability is -1 (minus one) then it means that the item in question cannot distinguish between high and low scorers IN ANY WAY.
False.
If true, d would = 0
d = -1 means it is discriminating - just in the opposite way that you would predict (it probably means it was a reverse-scored item and you forget to recode it)
The following frequency data refers to a question on a four option multiple-choice examination; the options are denoted (i), (ii), (iii), (iv) below. Students are divided into an Upper Group (top third of class based on overall exam score) and a Lower Group (bottom third of class based on overall exam score).
Upper Group: (i) 34, (ii) 89, (iii) 21, (iv) 3.
Lower Group: (i) 67, (ii) 43, (iii) 7, (iv) 29.
For example, this data indicates that 34 people from the Upper Group chose Option (i). What is the item discrimination index for this item if the correct answer is Option (ii)?
n in Upper Group = 147 (34+89+21+3) n in Lower Group = 146 (67+43+7+29) U = 89 L = 43 Applying Item Discrimination Index formula: (89/147) - (43/146) = .61 - .29 = .32
The following frequency data refers to a question on a four option multiple-choice examination; the options are denoted (i), (ii), (iii), (iv) below. Students are divided into an Upper Group (top third of class based on overall exam score) and a Lower Group (bottom third of class based on overall exam score). Option (ii) is designated as the correct answer.
Upper Group: (i) 7, (ii) 12, (iii) 0, (iv) 31.
Lower Group: (i) 9, (ii) 35, (iii) 0, (iv) 6.
For example, this data indicates that 12 people from the Upper Group chose Option (ii).
True or false, and why? (x2)
This item appears to have a redundant distractor.
True
Nobody in either the upper or lower groups choose option (iii),
Suggesting that it was obviously incorrect and might be a redundant distractor (in that it failed to distract anyone)
The following frequency data refers to a question on a four option multiple-choice examination; the options are denoted (i), (ii), (iii), (iv) below. Students are divided into an Upper Group (top third of class based on overall exam score) and a Lower Group (bottom third of class based on overall exam score). Option (ii) is designated as the correct answer.
Upper Group: (i) 7, (ii) 12, (iii) 0, (iv) 31.
Lower Group: (i) 9, (ii) 35, (iii) 0, (iv) 6.
For example, this data indicates that 12 people from the Upper Group chose Option (ii).
True or false, and why? (x4)
There are grounds for suspecting that this item might contain a scoring error or be worded in a misleading way
True
Most people in the Upper Group choose option (iv) rather than the option designated as “correct” (i.e. option ii),
Despite most people in the Lower Group choosing option (ii).
This raises the suspicion that there might be a scoring error or problem with the question - and hence that it should be double-checked.
This issue would also be flagged by the fact that this item would yield a negative item discrimination index
The following frequency data refers to a question on a four option multiple-choice examination; the options are denoted (i), (ii), (iii), (iv) below. Students are divided into an Upper Group (top third of class based on overall exam score) and a Lower Group (bottom third of class based on overall exam score).
Upper Group: (i) 16, (ii) 6, (iii) 5, (iv) 7.
Lower Group: (i) 3, (ii) 23, (iii) 4, (iv) 4.
For example, this data indicates that 16 people from the Upper Group chose Option (i).
How would you best describe the data, assuming the examiner has designated Option (i) to be the correct answer? (x1)
Why? (x2)
It is a difficult item, but not problematic.
Because, in the upper group, a greater proportion of people are choosing the right answer.
The fact that the lower group are going for a different option is fine – they are probably being appropriately misled as a result of not knowing the material as well
Is it appropriate to use item discrimination and item difficulty for speed tests? (x1)
Why? (x2)
No
Because what discriminates between people is not how many they get correct but many they complete
(as compared with power tests where the focus is on which items are correct)
True or false, and why?
For power tests, it is appropriate to calculate both Item Discrimination and Item Difficulty Indices.
True
It is the difficulty of the questions that is doing the job of separating high scorers from low scores
What are the five steps involved in creating and evaluating a test?
Test conceptualisation - what, why and how
Create the materials needed - e.g. Likert scale items
Design/run studies to assess validity, reliability, standardisation and item quality
Test revision - if it all works, consider improvements
Release it into the wild!
Give examples of some of the issues to be considered when conceptualizing a test (x4)
Why is it worth creating a new test? What has been done before? What does your test offer beyond that?
Who is using?
Context - how many test takers/yr?
Length - reliability vs practicality
Give an example of the sort of practicalities that might need to be considered when conceptualizing a test (x5)
Training/skill of administrator – eg real IQ test takes a full training course for proper competence
Test used by older adults – text big enough to read
How big is your budget?
How much time do you have
How many people are you testing? (costs etc)
Need internet access?
What sort of ethical issues might you need to consider when conceptualizing a test?
Anonymity - eg Qualtrics now accessible by CIA
Sensitive/offensive content - e.g. sex/impulsivity, or age when single Ps could be ID’d that way