Wk 11 - Item Response Theory Flashcards

1
Q

Under what circumstances would we need to use Signal Detection Theory? (x2)

A

Whenever you’ve got a task that involves discriminating between two stimuli.
Useful if you’re going into research or clinical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In a recognition memory task, what are the four possible outcomes?

A
Correct hit (said “yes” and was correct)
False positive/false alarm (said “yes” and was incorrect)
Correct miss (said “no” and was correct)
False negative (said “no” and was incorrect)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If we did not use signal detection theory, how could someone cheat a recognition memory test? (x1)

A

Just say yes every time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is sensitivity, in the context of signal detection theory? (x1 plus e.g.)
And why do we need to use detection theory to find it? (x1)

A

Ability to discriminate between stimuli (e.g. words that you heard previously and those you didn’t)
It gets confounded by response bias if only look at number of correct hits

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is response bias, in the context of signal detection theory?
And what effect does this have (e.g. x2)

A

Criterion for saying yes
Eg older adults tend to be more conservative (which may lead to fewer correct hits)
Another with exactly same memory, but liberal response bias will get more correct hits than person with conservative response bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does signal detection theory “deconfound” sensitivity and response bias?

A

By looking at false positives as well as correct hits -
We can enter hit rate and false alarm rate into one formula to get sensitivity,
And into another formula to get response bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is d’ (d prime)? (x1)
What do different value of d’ indicate? (x1)
How do you then use it? (x1)

A

A measure of sensitivity independent of response bias.
The distance inSDs between signal (words from the original list) and noise (words not on the original list) distributions
Use it in place of correct hits in all further analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is item response theory posited to ‘replace’? (x1)

What does Item Response Theory involve? (x2)

A

Classical Test/True Score Theory
Rather than summing responses (as in Signal Detection),
Gives THETA (θ) -
a function of the “examinee’s response interacting with the characteristics of the items” (Hogan, 2007, p.75), derived using complex equations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Latent Trait Theory?

A

Another term for Item Response Theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are Item Characteristic Curves ? (x4)

A

Used in Item Response Theory:
A plot of ability (= level on some trait = Theta) on x-axis,
Versus probability of getting a particular question right (ie item difficulty index = % of people who get the item correct) on y-axis
Remember - NOT item discrimination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What do Item Characteristic curves have to do with Item Difficulty Indices? (x1)

A

Item difficulty is what is plotted on the y-axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the link between Item Response Theory and Item Characteristic Curves?

A

In general, we want higher ability people to be more likely to get the item correct,
But specific shape of the curve gives useful additional information
ie item discrimination at a glance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Detail the steps you would go through to find the best equation to represent your item characteristic curve (x4)

A

Make educated guess as to which equation will fit the best (e.g. s-shaped curves can be created using equations known as logistic functions, e.g. Rasch model is commonly used)
Use some software that estimates the parameters for your chosen equation to get a curve that is as close as possible to the actual curve
Then do a “goodness of fit” test to see how well your equation actually fits your data – just as you do with linear regression
If it doesn’t fit, try again with another equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the three parameter model? (x1)

What are the three?

A
Commonly used s-shape producing logistic function in Item Response Theory
Item discrimination (point where the slope is steepest)
Item difficulty (level of ability needed to get the item right 50% of the time) and 
Level of guessing (chances of getting it right with no prior knowledge)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a logistic function? (x1 plus e.g.)

A

Those that produce the s-shaped curves found in Item Response Theory
e.g. Rasch model is commonly used

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Once we have modelled an item characteristic curve then what can we use it to do?

A

Gets us beyond crude right/wrong (linear) distinctions -

Every item Ps completes tell us about individual ability level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Give four disadvantages of item response theory compared with traditional psychometric methods.

A

Very difficult to understand and implement.
Software that is currently available is not user-friendly.
Requires large samples to get stable estimates of required parameters.
Some of the theoretical assumptions could be critiqued (requires more assumptions than Classical Test Theory).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Where was signal detection theory first used? (x2)

A

During near/psychophysics perception studies -

Looking for sensory thresholds, telling signal from noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is one modern application of signal detection theory? (x1 plus explain x3)

A

Wechsler memory test –
List memory: listing words heard and also
Recognition: yes or no to whether you heard the word before
• Needs signal detection theory, to figure out good interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is one example of a recognition memory task? (x1 plus explain x2)

A

California Verbal Learning Test
Ps attempting to discriminate between words that were on an original list they were read and words that were not
Tester: Was ‘airplane’ on the original list of words you heard? Or ‘screwdriver’? (Ps gives yes/no)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the difference between conservative and liberal response bias? (x2)

A

Conservative need closer to 100% certainty before saying they e.g. remember a previously heard word
While liberal just need a vague sense that the word is familiar in order to say yes (hit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do you do signal detection analysis in SPSS? (x7)

A

Work out hit rate using COMPUTE command (=hit_rate)
Work out the false alarm rate (= false_positive_rate).
Using COMPUTE command, type this into the “numeric expression window”:
PROBIT(hit_rate) - PROBIT(false_positive_rate)
Type “d_prime” into the “target variable” box.
Click OK
New variable, d’ (d prime), is a measure of sensitivity independent of response bias.

23
Q

How do you interpret d’ (d prime)? (x3)

A

Positive - person is recognising words from the original list to some degree.
0 - person is guessing (can’t distinguish new words from old words).
Negative - person is recognising words he didn’t see and not recognising words he did see (stuffed scoring - distinguishing, but backwards…)

24
Q

How was signal detection used in psychophysics experiments? (x6)
Describe what the correct hits, false alarms, correct negatives, and false negatives would correspond to.

A

Naylor & Lawshe, 1958 - Ps asked to indicate whether word was present on screen or not
Words presented in south faint ink – see how faint was too faint to detect
Higher thresholds for taboo words, suggesting some level of pre-conscious content processing of the word
Concluded it was evidence for pre conscious processing – your brain protects you from rude word
However, false alarms more embarrassing for swear words and people don’t expect them in this context – so
If you only counted up the correct hits, then whole experiment would be dud, due to response biases
Correct hit - said yes to word present
False alarms - said yes to word absent
Correct negatives - said no to word absent
False negatives - said no to word present

25
Q

How was signal detection used in diagnosing mental or physical illness? (x1)
Describe what the correct hits, false alarms, correct negatives, and false negatives would correspond to.

A

No diagnostic test is perfect, whether psych or medical – measurement principles are identical
Correct hit - diagnosing an illness present
False alarm - diagnosing an illness not present in reality
Correct negative - correctly diagnosing absence of illness
False negative - correctly diagnosing its absence

26
Q

How was signal detection used in analysing jury decision making? (x2)
Describe what the correct hits, false alarms, correct negatives, and false negatives would correspond to.

A

Deciding whether a defendant is guilty or innocent is a detection task.
Signal detection analysis found that instructions that you give jurors regarding the definition of ‘reasonable doubt’ affect response bias (the willingness to convict) rather than sensitivity (the ability to distinguish guilty from innocent defendants) (Thomas & Hogue, 1976).
Correct hit - jailing the guilty
False alarm - jailing the innocent
Correct negative - letting the innocent go
False negative - letting the guilty go

27
Q

How was signal detection used in industrial inspection? (x6)

Describe what the correct hits, false alarms, correct negatives, and false negatives would correspond to.

A

Quality control inspectors in industry found to detect fewer faulty items as shift progresses

Eg the factory makes barbie dolls, along a conveyer belt
• Someone’s job is to detect substandard dolls…
• As shift progresses, they detect fewer faulty dolls
Correct hit - pulling faulty doll off the line
False alarm - pulling good doll off
Correct negative - leaving a good doll to pass
False negative - leave a crappy one there

It’s response bias, not ability to detect bad barbie through e.g. tiredness
Matters - different intervention

28
Q

How was signal detection used in collision anticipation tasks? (x3)
Describe what the correct hits, false alarms, correct negatives, and false negatives would correspond to.

A

Hazard perception test involves drivers detecting hazards.
If we just counted correct detections, then you could cheat by clicking continuously all over the screen
Also possible that some items in scene could be ambiguously identified as a hit
Correct hit - ID present hazard
False alarm - ID ambiguous item as hazard
Correct negative - ignore non-hazardous event
False negative - ignore hazard

29
Q

What are two strategies Mark used to deal with liberal response bias/cheaters or ambiguous hazards in his hazard perception test? (x3 and x4)

A

Precise about what counts as hazard:
o Initially said ‘click on everything you perceive to be a hazard’ – those that were highly perceptive/paranoid would identify many more things
o So needed to be more explicit – “a traffic conflict with another road user”

Modified task: Hazard Change rather than Detection
o Two images, flashed alternately, differ in single element
o Gets around response bias, by removing reference to hazards etc – just look for change
o Someone who’s good at hazard perception, will spot the change much more quickly

30
Q

What are the advantages of Item Response/Latent Trait Theory over signal detection? (x5)

A

Takes into account item characteristics
Eg the difficulty of questions (could hypothetically use other characteristics too, e.g. discrimination) -
Which matters, especially if people complete different items
Such as in Question Horde, or in adaptive testing
Can examine other-than-linear relationships

31
Q

What do we know if an item characteristic curve is quite a flat s-shape? (x2)

A

Test has low item discrimination –

Function tells us even those with high ability are less likely to be correct, those with less are more likely

32
Q

What do we know if an item characteristic curve is quite a steep/upright s-shape? (x2)

A

Test has very high item discrimination index -

Function tells us that those with low ability do much worse than those with high

33
Q

What do we know if items produces vertically parallel version of the same characteristic s-shape?
e.g. if top one labelled A, then B, then C as we go down (x2)

A

Same discrimination, but A is easier than C;

Tells us in what way are they easy or hard, as we can see across whole range

34
Q

What do we know if an item characteristic curve is a reversed/backward s-shape? (x2)

A

First assumption would be that answer key was entered incorrectly, or maybe lecture material was wrong
• Dump the item or teach it better

35
Q

What do we know if items produces horizontally parallel version of the same characteristic s-shape?
e.g. if left one labelled A, then B, then C as we go to the right (x2)

A
Different items are “sensitive” across different ranges of Theta
Easier one (A) tells apart low and medium ability but rubbish at telling medium from high
Harder one (C) is more sensitive to diffs between high and moderate ability,
36
Q

What do we know if two groups produce different item characteristic curves?
e.g. if males and females start at similar point on y-axis (difficulty), but female curve is steeper and ends up well above male as ability increases (x3)

A

At some levels of ability, females have an advantage (if the difference is statistically significant)
Shows us a group-bias issue with the question,
Due to differences between those with same ability level at

37
Q

What could the Item Characteristic Curve look like if p = .92 (difficulty), and d = .09 (discrimination) (x1)
What would this mean? (x2)

A

Pretty flat line across top of graph area
92% got question right, and almost no discrimination
Question has hit ceiling, apart from the very very bottom of class

38
Q

What could the Item Characteristic Curve look like if p = .27 (difficulty), and d = .21 (discrimination) (x1)
What would this mean? (x2)

A

Flat start, but still rising from about half way
Only 27% getting it right, but discrimination is moderate –
Only really teases apart the top couple of grade bands

39
Q

What could the Item Characteristic Curve look like if p = .60 (difficulty), and d = .25 (discrimination) (x2)
What would this mean? (x1)

A

Could have improvements as ability increases across x-axis,
But still big dip at the end
What happened to the top of the class?

40
Q

What could the Item Characteristic Curve look like if p = .63 (difficulty), and d = .45 (discrimination) (x1)
What would this mean? (x1)

A

Starting to look more like the desired s-shape

Question is doing its job…

41
Q

As far as relationship equations go, what is the difference between Classical Test and Item Response Theory? (x3)

A

Classical assumes linear relationship - hence the use of correlations
So just need 2 parameters (slope and intercept) to describe it
Item Response can work with curvilinear - and so better fits much RL data

42
Q

What would we need to implement in order to have good adaptive testing?

A

Item Response Theory/Item Characteristic Curves
Limit testing time,
By starting them on medium questions,
Then harder/easier depending on whether right/wrong
Can’t do this with Classical Test Theory

43
Q

True or false, and why? (x1) Signal Detection Theory was originally developed for use in recognition memory tasks.

A

False

Was for perception tasks

44
Q

True or false, and why? (x1) Recognition memory tasks involve people making discrimination judgements

A

True

It involves discrimination between words seen before and not

45
Q

Imagine you are given a test of hearing where you are played a series of beeps of different volumes over headphones. Some of these beeps are too quiet to hear: the test is designed to find the quietest volume you can hear. Your task is to indicate when you can hear a tone. If you want to try and pretend your hearing is better than it might be and you know that the test is NOT employing signal detection theory, should you apply a liberal or conservative response criterion?
Why? (x3)

A

Liberal
“Signal” is the tone and “noise” is no tone -
You are trying to discriminate between the presence or absence of a tone.
A liberal criterion (saying “yes” even when you have doubts that a tone was presented) would maximize correct hits

46
Q

True or false, and why? (x2)

If d’ (d prime) is zero then the signal and noise cannot be distinguished

A

True
Means people can’t tell signal presence from absence
eg have gone blind/deaf…

47
Q

True or false, and why? (x2)

In Item Response Theory, you can use sophisticated linear correlation coefficients to model performance

A

False
Linear is for Classical Test Theory
Item Response needs curvilinear equations

48
Q

True or false, and why? (x2)

Signal Detection Theory is useful when investigating jurors’ judgements of the innocence or guilt of a defendant

A

True
Jurors have to discriminate between innocent and guilty defendants.
Hence it’s a discrimination task

49
Q

True or false, and why? (x1)
For quality control inspectors in a factory, the rate of detecting faulty products decreased over the duration of a shift (Davies & Parasuraman, 1982). This was found to be due to a change in their sensitivity rather than their response bias

A

False

Its response bias, not sensitivity

50
Q

True or false, and why? (x2)
Someone who has a conservative response bias in a recognition memory task will be less likely to remember the task items than someone who has a liberal response bias

A

False
Level of memory would be measured by sensitivity not response bias.
That is, differences in response bias would not be expected to affect level of memory

51
Q

True or false, and why? (x1)
When scoring memory accuracy in a recognition memory task, we need to measure both the correct hit rate and the false positive rate

A

True
If, for example, we only measured correct hit rate, then our measurement of memory (sensitivity) would be confounded by response bias

52
Q

In item Response Theory, what is Theta? (x2)

A

A measure of a test taker’s overall level of ability

assuming they’re completing some sort of achievement or aptitude test

53
Q

True or false and why? (x2)
When using Item Response Theory, it is possible that two people could get the same number of questions correct in some aptitude test but nonetheless end up with completely different test scores

A

True
Adaptive testing could mean that different test takers end up answering different questions.
So the person who was answering more difficult questions would get a higher score to reflect the fact they were able to get more difficult questions correct

54
Q

Is trait score a parameter in the three parameter model used in Item Response Theory? (x1)
Why? (x3)

A

No

Because they’re discrimination, difficulty and level/chance of guessing