L2: Classical Test Theory Flashcards
What is the central statistic in classical test theory?
The sum score
True Score - Definition
The score that would be obtained if a perfect measurement instrument was used
Classical Test Theory - Core Assumptions
1) Observed score = true score + measurement error
2) Measurement error is random
What are the implications of measurement error being random
1) mean m.err = 0
(it cancels out bc it is random)
2) there is no correlation between the true score & measurement error
3) all of the observed variance can be explained by the true score variance & measurement error (there are no other sources of noise)
Classical Test Theory - Definition
Measurement theory that defines the conceptual basis of reliability & outlines procedures from estimating the reliability of psychological test scores
Measurement Error - Definition
Extent to which other characteristics contribute random noise to the differences in observed scores
Reliability - Definition
- Measure of whether something is consistent (stays the same)
- Results are considered to be reliable if they are similar each time they are carried out using the same design, procedures, measurements
- Extent to which differences in respondents observed scores are consistent with differences in their true scores
What are two ways of defining reliability?
1) As a proportion of variance
2) As a proportion of shared variance
Reliability (as a proportion of variance) - Formula
True score variance/observed score variance
1 - (error score variance/observed score variance)
Reliability (as a proportion of variance) - Definition
The proportion of observed score variance that is attributable to true score variance
Reliability (as a proportion of shared variance) - Definition
The proportion of variance shared between the true scores & observed scores
Reliability (as a proportion of shared variance) - Formula
(correlation observed+true)²
1 - (correlation observed+error)²
Note: squaring a correlation gives you the amount of variance shared by those variables
What are the four test models of reliability?
1) Parallel Test Model
2) Tau Equivalent Test Model
3) Essential Tau Equivalent Test Model
4) Congeneric Test Model
List the test models of reliability from most to least restrictive
1) Parallel Test Model
2) Tau Equivalent Test Model
3) Essential Tau Equivalent Test Model
4) Congeneric Test Model
Parallel Test Model - Assumptions
True Scores:
= mean
= variance
Observed Scores:
= mean
= variance
Error Scores:
= variance
Correlation:
= correlation true&observed
Reliability:
= R (test 1 & 2)
Parallel Test Model - Assumptions True Scores
= mean
= variance
Parallel Test Model - Assumptions Observed Scores
= mean
= variance
Parallel Test Model - Assumptions Error Scores
= variance
Parallel Test Model - Assumptions Reliability
= reliability
Parallel Test Model - Tests
1) Alternate Forms
2) Split-Halves
3) Test-Retest
Tau Equivalent Test Model - Assumptions
True Scores:
= mean
= variance
Observed Scores:
= mean
Tau Equivalent Test Model - Assumptions True Scores
= mean
= variance
Tau Equivalent Test Model - Assumptions Observed Scores
= mean
Tau Equivalent Test Model - Assumptions Error Scores
none
Tau Equivalent Test Model - Assumptions Reliability
none
Tau Equivalent Test Model - Tests
1) Alpha
Essential Tau Equivalent Test Model - Assumptions
True Scores:
= variance
Essential Tau Equivalent Test Model - Assumptions True Scores
= variance
Essential Tau Equivalent Test Model - Assumptions Observed Scores
none
Essential Tau Equivalent Test Model - Assumptions Error Scores
none
Essential Tau Equivalent Test Model - Assumptions Reliability
none
Essential Tau Equivalent Test Model - Tests
Cronbach’s Alpha
Congeneric Test Model - Assumptions
none
Congeneric Test Model - Assumptions True Scores
none
Congeneric Test Model - Assumptions Observed Scores
none
Congeneric Test Model - Assumptions Error Scores
none
Congeneric Test Model - Assumptions Reliability
none
Congeneric Test Model - Tests
Omega
Parallel Test Model - True Score Formula
Xt2 = Xt1
Tau Equivalent Test Model - True Score Formula
Xt2 = Xt1
Essential Tau Equivalent Test Model - True Score Formula
Xt2 = a + Xt1
Congeneric Test Model - True Score Formula
Xt2 = a +bXt1
Parallel Test Model - Observed Score Formula
Xo1 = Xt1 + Xe1
Xo2 = Xt1 + Xe2
Tau Equivalent Test Model - Observed Score Formula
Xo1 = Xt1 + Xe1
Xo2 = Xt1 + Xe2
Essential Tau Equivalent Test Model - Observed Score Formula
Xo1 = Xt1 + Xe1
Xo2 = a + Xt1 + Xe2
Congeneric Test Model - Observed Score Formula
Xo1 = Xt1 + Xe1
Xo2 = a + bXt1 + Xe2
What are the main methods of reliability estimation?
1) Alternate Forms
2) Test-Retest
3) Internal Consistency
What are the different tests in the alternate forms method?
2 alternate forms/versions of the same test
What test model does the alternate forms method follow?
parallel test model
What is the reliability in the alternate forms method?
reliability is the correlation (between the 2 test versions)
What are limitations of the alternate forms method?
- It is very difficult in practise to create two versions of a test that are unique yet still parallel
- carryover effects
Carryover Effects - Definition
An effect of being tested in one condition on participants behaviour in later conditions
What are the different tests in the test-retest method?
the same person takes the same test on more than one occasion
What test model does the test-retest method follow?
Parallel test model
What is the reliability in the test-retest method?
Reliability is the correlation (of the two test taking occasions)
What are limitations of the test-retest method?
- Difficult to do for constructs that naturally fluctuate over time (change in true scores)
- Carryover effects
- People might not want to take the test a second time
What are the different tests that fall under the general internal consistency method?
1) Split-Half
2) Cronbach’s Alpha
3) Omega
What are limitations of the general internal consistency method?
Carry over effects can cause there to be a correlation between the error scores of different items
What are the different tests in the internal consistency method?
(Blocks of) items are treated as separate tests
What test model does the internal consistency method follow?
Parallel test model OR essential tau equivalent model
What are the different reliability measures under Cronbach’s Alpha?
- Raw Alpha
- Standardized Alpha
- KR20
What test model does Cronbach’s Alpha follow?
Essential tau equivalent model
What are limitations of Cronbach’s Alpha?
- assumptions are hardly ever met in reality
- Cronbach’s Alpha is lower bound to the reliability (it will underestimate the reliability)
What are the different tests in Cronbach’s Alpha?
1) Raw Alpha
2) Standardised Alpha
3) KR20
When should you use Raw Alpha?
For tests with items that do not substantially differ in their variances
Raw Alpha - Consistency Index
Sum of all covariances amongst items (ΣC ᵢ ᵢ)
When should you use Standardised Alpha?
For tests with items that substantially differ in their variances, which causes the test scores to only reflect items with very high variances
Standardised Alpha - Consistency Index
Average of all correlations amongst items (r ᵢ ᵢ,)
When should you use KR20?
For binary items
KR20 - Consistency Index
Sum of item variances (Σpq)
What are the different tests in Omega?
Each item of the test is considered to be a separate test
What test model does Omega follow?
Congeneric test model
What is reliability in Omega?
Reliability is the:
true score variance/observed score variance
What reliability estimation method is omega apart of?
Internal Consistency
What reliability estimation method is split-halves apart of?
Internal Consistency Method
What reliability estimation method is alpha apart of?
Internal Consistency Method
What are the different tests in split-halves?
Test is split into 2 parts
What test model does split-halves follow?
Parallel test model
What is the consistency index in split-halves?
Reliability is the correlation (between test halves)
What are the limitations of split-halves?
- Reliability is heavily influenced by the type of split done
- It cannot be used for speeded tests, as you will almost always get a correlation close to 1.0. This is because response speeds are consistent throughout the entire test.
- Other methods utilise more information about the test
What are factors that affect reliability?
1) Test Length
2) Sample Heterogeneity
3) Reliability of Difference Scores
How does test length influence the reliability of a test?
reliability will increase with more items added
How does sample heterogeneity influence the reliability of a test?
In homogenous samples, the reliability is lower because of lower true score variance. In heterogenous samples, the reliability is higher because of greater true score variance. Both of these are not desirable, as reliability should be a property of the test, and not a property of the sample being examined.
Reliability Generalisation Study - Definition
Study intended to reveal the degree to which a test produces differing reliability estimates across different kinds of research uses & populations (aka how sample characteristics affect the reliability of test scores)
Difference Score - Formula
posttest score - pretest score
What influences the reliability of difference scores?
The correlation between pretest & postest. If the correlation is very high, the reliability is low.
What is the difference score sensitive to?
The variance between pretest & post-test. However, this is not as relevant for pretest -posttest designs as the difference will never be that large
What are the approaches to true score estimation?
1) True score estimate is the summed item score
2) True score estimate is the summed item score, corrected for regression to the mean. The lower the reliability is, the more the true score estimate is corrected to be closer to the mean.
Regression to the mean - Definition
If one sample of a variable is extreme, the next sampling is likely to be closer to the mean.
What does standard error (sem) represent?
The average size of error scores
What is the relation between reliability & standard error?
The higher the reliability, the lower the standard error, and vice versa
Attenuation - Definition
lessening/weakening in the intensity, value, or quality of a stimulus. In terms of reliability, attenuation refers to the fact that the effect sizes/correlations of the observed scores will always be smaller than that of the true scores, due to including the measurement error.
What is the effect of reliability on statistical significance?
Reliability has a direct effect on statistical significance- With high reliability, larger observed effect sizes are possible, which increases the likelihood of a significant result.
Point estimate - Definition
Specific value that is interpreted to be the best estimate of an individuals standing on a particular psychological attribute
Internal Consistency - Definition
Degree to which differences amongst participants responses to one item are consistent with differences amongst their responses to other items on the test (how consistent the test items are with each other)
Item Discrimination - Definition
Degree to which an item differentiates people who score high on the total test from those who score low on the total test
Item-total Correlation - Definition
Degree to which differences amongst participants responses to the item are consistent with differences in their total test score. If the correlation is high, then the item is highly consistent with the total test scores.
Corrected Item-total Correlation - Definition
The consistency between an item and the other items on a test (correlation between responses to item one and the sum of all other items on the test)
How to interpret “Cronbach’s Alpha if Item Deleted” on an Item-Total Statistics Table?
Items that increase alpha when dropped should be removed. However, only remove these items if the increase in reliability is deemed important enough.
How to interpret “Corrected Item-Total Correlation” on an Item-Total Statistics Table?
This table tells you the consistency between that item & other items on the test. Items with a high corrected item-total correlation also have high item discrimination.
When do you use the Discrimination Index?
When analysing the Internal Consistency of a test with binary items
What does the Discrimination Index show?
The proportion of high test scorers that answered the item corrected compared with the proportion of low test scorers that also answered the item correctly.
How do you interpret a Discrimination Index value?
Higher DI values are indicative of greater internal consistency, as high and low test scorers differ significantly in the likelihood of answering the item correctly
What is COTAN?
A Dutch committee involved in the evaluation of (new) psychological tests
What are the COTAN guidelines for high impact inferences at the individual level?
good = r ≥ .9
satisfactory = .8 ≤ r ≤ .9
insufficient = r ≤ .8
(between .8 & .9)
What are the COTAN guidelines for lower impact inferences at the individual level?
good = r ≥ .8
satisfactory = .7 ≤ r ≤ .8
insufficient = r ≤ .7
(between .7 & .8)
What are the COTAN guidelines for inferences at the group level?
good = r ≥ .7
satisfactory = .6 ≤ r ≤ .7
insufficient = r ≤ .6
(between .6 & .7)