reliability Flashcards
What is Classical Test Theory (CTT)?
A theoretical framework for reliability defined by assumptions describing how measurement errors influence observed test scores.
What is the fundamental equation of reliability theory? (Assumption 1 of CTT)
X = T + E, where X is the observed test score, T is the true score, and E is the random error score.
What does ε(E) = 0 signify in CTT?
The average error score across repeated testing is zero, meaning positive and negative errors cancel each other out.
What are the implications of ε(X) = T in CTT?
It allows us to derive that ε(E) must be 0, confirming that measurement errors are random.
What does ‘E’ represent in the context of reliability?
Unsystematic, or random, measurement error that deviates an examinee’s observed score from the true score.
What are the assumptions of CTT regarding error scores? (Assumption 3)
Errors are independent and do not correlate with true scores. P(ET) = 0
What does it mean if two tests are parallel according to CTT?
They are tau-equivalent:
They satisfy Assumptions 1 through 5, measure the construct equally well (T = T’), and have the same level of error variance.
How is the reliability coefficient defined?
It is the proportion of observed score variance attributable to true-score variance.
What does a reliability coefficient of 1 indicate?
Observed-score variance reflects entirely true-score variance, indicating perfect reliability.
What does a reliability coefficient of 0 indicate?
Observed-score variance reflects entirely error-score variance, indicating zero reliability.
What is the significance of the equation σ²_X = σ²_T + σ²_E?
It means observed-score variance is equal to the sum of true-score variance and error-score variance.
What is the implication of having a heterogeneous sample for reliability estimation?
Greater variability among people increases true-score variance, which enhances reliability.
Fill in the blank: According to CTT, if two tests are essentially tau-equivalent, they have true scores that are the same except for an _______.
additive constant.
True or False: Congeneric measures have perfectly correlated true scores.
True.
What does a higher reliability indicate about estimating true scores from observed scores?
The higher the reliability, the more confident we can estimate true scores from observed scores.
What does it mean when reliability falls between 0 and 1?
Observed-score variance includes some true-score variance and some error-score variance.
What is the relevance of error variance in relation to the reliability coefficient?
Reliability reflects the degree to which error variance is minimal compared to the variance of observed scores.
What is a primary challenge in estimating reliability based on CTT?
There is no way of knowing the true scores or the error associated with test responses.
What is the difference between parallel tests and essentially tau-equivalent tests?
Parallel tests have equal error variance, while essentially tau-equivalent tests do not.
What are the assumptions that must be satisfied for two tests to be considered parallel?
They must meet Assumptions 1 through 5 and measure the construct equally well.
How does the reliability coefficient relate to observed and true scores?
It represents the correlation between observed and true scores.
What does the assumption ε(X) = T imply in terms of test scores?
It implies that the observed scores reflect the true scores without systematic error.
What is the relationship between true-score variance and observed score variance in parallel tests?
The correlation between scores on two parallel forms of a test is equal to the ratio of true-score variance to observed score variance.
how to prove reliability in the context of parallel tests?
Reliability is proven by equal observed score variance for parallel tests, assuming errors are random and uncorrelated.
Fill in the blank: The correlation between scores on two parallel forms of a test is equal to the ratio of _______ to observed score variance.
[true-score variance]
What is cov(X, X) in relation to the covariance of scores?
cov(X, X) is equal to the variance of X.
What does the notation ρ^2 represent?
ρ^2 represents the ratio of true-score variance to observed score variance.
True or False: Errors in parallel tests are correlated.
False.
What assumptions are made about errors in the calculation of reliability?
Errors are assumed to be random and uncorrelated.
What does the formula cov(T, T’) represent?
It represents the covariance between true scores T and T’.
List the components of the covariance formula for true and error scores.
- cov(T, T)
- cov(T, E)
- cov(E, T)
- cov(E, E)
What does the notation σ represent in this context?
σ represents the standard deviation of the scores.
What is the significance of Assumption 6 in this context?
Assumption 6 is used to derive relationships between the variances and covariances of scores.
True or False: The observed score variance is equal to the sum of true score variance and error variance.
True.
Fill in the blank: The formula for the correlation coefficient is _______.
[cov(T, T) / (σ_T^2)]
assumption 1-5 of ctt
X = T + E (1)
E(X) = T (2)
ρ_ET=0 (3)
ρ_E1T2=0 (4)
ρ_E1E2=0 (5)
assumption 6 of ctt
Observed scores X and X’ satisfy Assumptions (1) to (5)
Tau-equivalent condition: The two tests measure the construct equally well, thus T = T’ T1 = T2
The tests have the same level of error variance (〖ϑ_E〗^2= 〖ϑ_E’〗^2)
Assumption (7): Two tests are considered essentially tau-equivalent if (when tau-equivalent cannot be met):
- They have observed scores X and X’ that satisfy Assumptions (1) to (5)
- They have true scores that are the same except for an additive constant (i.e., T1 = T2 + c)
What is Test-retest reliability
- Correlate the scores obtained from the same test administered on two different occasions, based on the assumptions:
o (1) that the true scores are stable across the two occasions
o (2) the error variance of the first testing equals to that of the second testing. - This provides an estimation of the stability of the test scores.
Problems due to assumptions of test-retest reliability
Assumptions
o (1) that the true scores are stable across the two occasions
o (2) the error variance of the first testing equals to that of the second testing.
Problems:
o Length of test-retest interval is critical.
Too short: Possibility of carryover effects e.g. memory/practice effects
Too long: if true scores are not stable across time, and true score changed over time
o Estimate inappropriate if the construct measured is not stable over time (e.g., a state-like construct).
Need to try to maintain the error variance of measurement
o Hence, only use test-retest reliability if the construct is known to be stable across time/less susceptible to carryover effects e.g. visual acuity
Assumptions of test-retest reliability
o (1) that the true scores are stable across the two occasions
o (2) the error variance of the first testing equals to that of the second testing.
Advantages of using Internal consistency reliability rather than test-retest reliability
- A useful practical alternative because it requires respondents to complete only one test at only one point in time, thus avoiding the problems associated with repeated testings.
o premise is that different parts of a test can be treated as different forms of the test* and correlating the scores for these different parts provide a reliability estimate. –> Mitigates the issue that the construct needs to be stable across time
How is internal consistency reliability obtained?
- This reliability provides an estimate of how different parts of the test are consistent with each other.
- The premise is that different parts of a test can be treated as different forms of the test* and correlating the scores for these different parts provide a reliability estimate.
- Three different approaches to the internal consistency method of estimating reliability:
o (1)split-half
o (2) Cronbach’s alpha
o (3) standardized alpha
assumptions of using internal consistency reliability (and its problem)
o Two halves have are as good as each (same validity)
Problem: two forms may not be parallel if the validity of the two halves are not equal
assumptions of split-half approach
o Items added/removed are parallel (Test items are good)
Split-half approach (how it is done)
The test is split into two halves (e.g., odd/even or first/second
half), and scores for the two halves (Y and Y’) are correlated.
As the correlation is only a measure of the reliability of one half of the test, the reliability of the entire test (X = Y + Y‘) would be greater and is estimated using the Spearman-Brown formula:
Reliability is a function of test length (Longer test Higher reliability)
If we just correlate the two halves underestimation of reliability as the actual test is longer need for estimation of the reliability of the entire test using the Spearman-Brown formula
ρ_XX’=(2ρ_YY’)/(1+ρ_YY’ ) where ρ_YY’ is the correlation of the spilt halves of the tests Y and Y’ are parallel and X and Y and Y’ are parallel; X is the real test
Hence, should adhere to assumption 6 of CTT
disadvantage of split half approach
- Different ways to split – odd-even; first-second half - reliability estimates are not the same
spearman brown formula
refer to notes
ρ_XX’=(Nρ_YY’)/[1+(N-1)ρ_YY’ ]
where N is the factor by which the test is increased if split into half then N =2, ρ_YY’ is the correlation based on split halves
what is cronbach’s alpha and how does it split tests
- Alpha is a measure of internal consistency, which refers to the interrelatedness of a set of items.
- Splits test to the item-level
o Such that we get a best estimate of reliability