Lecture 3 - Item Analysis Flashcards
What is the objective of item analysis and what is the problem that is present?
• The purpose of item analysis is to find items that form an internally consistent scale and to eliminate those items that do not.
• A scale is only useful if it is able to detect differences among individuals on the measured construct.
• A good item is one that contributes variance to the test score.
• Problem: There is no yardstick to evaluate whether the
item variance is small or large…
What do we examine in a psychometric evaluation of items?
Item Means
Inter-item correlation
Corrected item-scale (or item-total) correlation
Coefficient alpha
What do extreme item means tell us?
• Extreme item means tells us that perhaps there is a problem with the item. It could have meant that
The item is worded too strongly or too weakly (hence, most
respondents agree/disagree); OR
The said event always (never) occurs for most respondents.
• Items with mean close to the middle of the score range (i.e., around
4 on a 7-pt scale) is therefore ideal, though we would also want to
retain a range of items with different mean/distributions.
What is the item difficulty index?
In objective items (e.g., mcqs) with correct/incorrect response or in binary items, item means is actually the proportion of respondents who answered an item correctly:
p = N_c/N
N — total number of respondents
N_c— number of respondents who answered correctly
Why is it preferable to have items with p = 0.5?
Variance = p(1-p)
Variance is highest when p = 0.5
Item difficulties for an entire test should spread out across the full range (i.e., to have some easy and some difficult items so as to tell apart the strong test takers).
• Why is inter-item correlation important?
Items with low variance correlates poorly with other items.
Items that inter-correlate with each other contribute additionally to the variance of the total scale. Why?
s^2_composite = s^2_i + s^2_j +2r_ij s_i s_j
• Clark & Watson (1995) argued that one should examine the range and distribution of all inter-item correlations because the mean inter-item correlations may be misleading.
- Recommended that all of the individual inter-item correlations should fall within the range of .15 to .50 depending on the construct measured.
- If construct is broad, mean interitem correlation —.15-.20
- If construct is narrow, mean interitem correlation —.40-.50
If inter-item correlation is greater than .70, item redundancy is a problem
We want the corrected item-total correlation where the item being evaluated is correlated with the sum of the remaining items in the scale (excluding the item itself). Why?
Doing so will artificially inflate the item-total correlation, especially when there are very few items in the scale
How high should the item-total correlation be for an item to be included in the scale?
No clear rule of thumb. Studies vary in their item-total correlation cutoff for item elimination from 0.35 to .5.
Some studies decided that the scale should have m items, then the m items with the largest item-scale correlation would be chosen.
Item-total correlation is usually considered together with coefficient alpha.
What does coefficient alpha reflect?
• Coefficient alpha reflects internal consistency reliability.
• Alpha takes on value from 0 to 1. When alpha is negative, something is wrong (such as negative interitem correlations).
• Common guidelines on how alpha value is judged:
DeVellis: minimally acceptable alpha: .70 (p.95)
Clack & Watson (1995): min. acceptable alpha: .80
What affects alpha?
Number of items in the scale
Interitem correlations
• With a longer scale, the resulting alpha has a narrower C.I. compared to alpha based on a shorter scale. This means that the reliability of alpha increases with the number of items.
• What is the trade-off of focusing on maximizing alpha?
Attenuation paradox - increasing internal consistency of a test, beyond a certain point, will not enhance construct validity and may even occur at the expense of validity
Strongly correlated items are likely to be highly redundant, and contribute little more construct information than any one item individually
• How many items should one aim for in a scale?
Depends on the dimensionality and whether the construct has a broad/narrow domain.
For broad concepts, a maximum should be approx around 35 items.
Striving a balance between reliability and brevity
- In the case of research instrument, there is little need to strive for higher reliability once a .80 reliability is obtained (Clark & Watson, 1995; Netemeyer, Bearden, & Sharma, 2003).
The minimum is for each factor or dimension to include at least 3 items.
• What does it mean when there exists negative item-scale correlations?
Check for errors (e.g., failure to reverse score)
The item may be poorly written (e.g., ambiguous)
Item was inappropriate for the current sample of respondents
If it is none of the above, go back to the conceptualization of the construct to consider if construct has been properly defined.
What is a caveat regarding alpha?
• Reliability indices such as Cronbach’s alpha does not tell us anything about the dimensionality (homogeneity) of the scale.
• Homogeneity of a scale refers to whether the scale items assess a single underlying construct.
• Alpha assumes homogeneity and checks on whether the items in a scale are sufficiently inter-related (internal consistency).
Concept of Dimensionality
What is dimensionality is concerned with?
• Dimensionality is concerned with homogeneity of items:
A set of items is homogeneous when responses to all the items is a function of the same psychological attribute.
Such a test is considered unidimensional because it appraises one and only one construct or trait.
• Establishing dimensionality of constructs is an important part of the scale development process.