Module 2: Norms and Reliability Flashcards
What is Classical Test Theory? (CCT)
CCT is a model for understanding measurement
CCT is based on the True Score Model…
… for each person, their observed score on a test is comprised of: - Observed score (X) = True Score (T) + Error (E)
What is a true score?
True score is a person’s actual true ability level (i.e. measured without error).
What is error?
Error is a component of observed score unrelated to the test takers rue ability or trait being measured.
True variance and Error variance thus refer to the variability in a collection/population of test scores.
What is reliability?
Reliability refers to consistency in measurement.
- According to CCT: reliability is the proportion of the total variance attributed to true variance
What is test administration error?
Test administration: variation due to the testing environment
- Testtaker variables (e.g., arousal, stress, physical discomfort, lack of sleep, drugs, medication)
- Examiner variables (e.g., physical appearance, demeanour)
What is test scoring and interpretation error?
Test scoring and interpretation:
Variation due to differences in scoring and interpretation
What are methodological errors?
Variation due to poor training, unstandardized administration, unclear questions, biased questions.
CCT True-score Model vs. Alternative
- True Score Model of measurement (based on CCT) is simple, intuitive, and thus widely used
- Another widely used model of measurement is Item Response Theory (IRT)
- CTT assumptions more readily met than IRT, and assures only two components to measurement
- But, CTT assumes all items on a test have an equal ability to measure the underlying construct of interest.
Item Response Theory (IRT)
- IRT provides a way to model the probability that a person with X ability level will correctly answer a question that is ‘tuned’ to that ability level.
What does IRT incorporate and consider?
- IRT incorporates considerations of item Difficulty and discrimination
o Difficulty relates to an item not being easily accomplished, solved, or comprehended.
o Discrimination refers to the degree to which and item differentiates among people with higher or lower levels of the trail ability, or construct being measures.
Reliability estimates
Because a person’s true score is unknown, we use different mathematical methods to estimate the reliability of tests.
Common examples include: - Test-retest reliability - Parallel an Alternate forms of reliability - Internal consistency reliability o E.g., split in half, inter item correlation, Cronbach’s alpha - Interrater/interscorer reliability
Test-retest reliability
Test-retest reliability is an estimate of reliability over time
- Obtained by correlating pairs of scores from same people on administration and same test at different times
- Appropriate for stable variables (e.g., personality)
- Estimates tend to decrease as time passes
Parallel and Alternate Forms Reliability
- Parallel forms: two versions of a test are parallel if in bother versions the means and variances of test scores are equal
- Alternate forms: there is an attempt to create two forms of a test, but they do not meet strict requirement of parallel forms
- Obtained by correlating the scores of the same people measured with the different forms.
Split half reliability
Obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.
Entails three steps:
- Step 1: Divide the test into two halves
- Step 2: Correlate scores on the two halves of the test.
- Step 3: Generalise the half-test reliability to the full-test reliability using the Spearman-Brown formula.
Inter-item/correlating
The degree of relatedness of items on a test. Able to gauge the homogeneity of a test