Module 2: Norms and Reliability Flashcards
Classical Test Theory: (CCT)
is a model for understanding measurement.
CCT is based on the True Score Model. See Notes
True score:
is a person’s actual true ability level (i.e., measured without error).
Error:
is a component of observes score unrelated to test-takers true ability or trait being measures.
True variance and error variance:
thus, refer to the variability is a collection/population of test scores.
Reliability:
refers to consistency in measurement. See Notes.
Systematic error
Would be the same for everyone. Being in a noisy classroom made everyone perform 10 points worse.
Random error:
the good error. Unrelated to the persons true score/environment. There is nothing you can do about random error. You just need to be aware it is there. We are only trying to measure random error. Because, if there is a systematic error that is affecting everyone. Then everyone is treated the same.
Sources of Measurement Error:
- Test Construction: : Variation due to differences in items on same test or between tests (i.e. item/content sampling)
- Test Administration: Variation due to testing environment.
• Test taker variables (e.g., arousal, stress, physical discomfort, lack of sleep, drugs, medication).
• Examiner variables: (e.g., physical appearance, demeanour). - Test scoring and interpretation: Variation due to differences in scoring and interpretation.
- Sampling error: Variation due to representativeness of sample.
• The larger the sample, the smaller the sampling error. - Methodological errors: Variation due to poor training, unstandardized administration, unclear questions, biased questions.
Item Response Theory (IRT).
IRT provides a way to model the probability that a person with X ability level will correctly answer a question that is “tuned” to that ability level.
IRT incorporates considerations of item Difficulty and Discrimination.
o Difficulty: relates to an item not being easily accomplished, solves, or comprehended.
o Discrimination: refers to the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or construct being measured.
You want different questions varying in degree of difficulty (speeding fine vs heroin) and you want to discriminate between the two (speeding fine is low level of trait, and heroin use is high levels of the trait)
CCT True Score Model vs. Alternatives
- True Score Model of measurement (based on CCT) is simple. Intuitive, and thus widely used.
- Another widely used model of measurement is Item Response Theory (IRT).
o CCT assumptions more readily met than IRT and assumes only two components to measurement.
o But CCT assumes all items on a test have equal ability to measure the underlying construct of interest.
o E.g., a test measures risk taking. Have you ever had a speeding fine? (risk taker) Have you used heroin before (risk taker). Speeding fine is more common than heroin. Therefore, Classical Test Theory those two items would be equivalent in assuming risk taking behaviour. Whereas in reality, that is not a fait assumption.
o Item response theory can address this by examining items specifically and see how those items perform differently in the construct attempting to be measured.
Reliability Estimates:
Because person’s true score is unknown, we use different mathematical methods to estimate the reliability of tests.
Common examples include:
• Test-retest reliability
• Parallel and alternate form’s reliability
• Internal consistency reliability
o E.g., split half, item correlation, Cronbach’s alpha
• Interrater/interscorer reliability
Test-retested reliability:
is an estimate of reliability over time.
- Obtained by correlating pairs of scores from same people on administration of same test at different times.
- Appropriate for stable variables (e.g., personality, NOT mood) (if it’s meant to be the same over 1 week then fine)
- Estimates tent to decrease as time passes.
Parallel forms:
Two versions of a test are parallel if in both versions, the means and variances of test scores are equal. E.g., in neuropsych, we might want to test the same thing twice, we can’t use the exact same test because the participant might remember the answers. The test has to be equal for this to work or else you test sperate things on separate tests.
• More strict
Alternate forms:
there is an attempt to create two forms of a test, but they do not meet strict requirement of parallel forms.
• Obtained by correlating the scores of same people measures with the different forms.
Internal consistency measures:
Split half reliability
Inter-item consistency/correlation
Kuder-Richardson formula 20
Coefficient alpha
Split half reliability:
obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.
Entails 3 steps:
1. Divide the test into two halves.
2. Correlate scores on the two halves of the test.
3. Generate the half-test reliability to test the full-test reliability using the spearman brown formulae.
4. Cronbach’s Alpha
Inter-item consistency/correlation:
the degree of relatedness of items on a test. Able to gauge the homogeneity of a test.
Kuder-Richardson formula 20:
statistic of choices for determining the inter-item consistency of dichotomous items.
Coefficient alpha:
mean of all possible split-half correlations, corrected by the spearman-brown formula. The most popular. Values range from 0 – 1.