Week 3: Flashcards
Reliability and Validity
Variance Model (CTT)
Observed variance = True Variance + Error Variance
Reliability Defined (Variance)
Proportion of observed score variance attributable to true score variance
Reliability Coefficient
Tells us what proportion of the observed variance is non-error
- A coefficient of .75 indicates that 75% of the variance in test scores for the group is due to true differences and 25% of the variance in test scores is due to error.
Test-Retest Reliability
Correlation between the scores obtained by the same persons on an identical test administered on two separate occasions
Shows the extent to which scores on a test can be generalised over different occasions
What is the Inter-Test Interval
Time between test administrations
Error Test-Retest addresses and does not address
Only appropriate for stable characteristics
Addresses test-taker variables
- i.e., fatigue
Influenced by test administration errors (weather) and Scoring + Interpretation.
Test-Retest Limitations
Content Sampling Error
Nuisance to Obtain data
Practice Effect
Alternate Form Reliability
Use of 2 separate forms of the test - (Similar items, time limit, content specifications etc.)
Correlation between scores obtained on the two test forms represents the reliability coefficient
Error Sources of Alternate Forms
Unsystematic error depends on inter-test intervals
- Administered Immediately in Succession
–> Addresses content sampling
-Few days to Weeks
–> Addresses content sampling and test-taker variables (fatigue)
Both subject to test administration, scoring + interpretation errors.
Limitations of Alternate Forms
Most tests do not have an alternate form
Inter-Scorer Reliability
Degree of agreement or consistency between 2 or more scorers/raters
Reliability is determined by the correlation between different raters scores of the same persons
Error Sources & Limitations of Inter-Scorer Reliability
Addresses errors from scoring + interpretation.
No info on any other sources of error
How is Internal Consistency calculated
Reliability is determined by examining the relo among the items on one test at a single point in time
Are the items in a measure internally consistent with each other.
- Degree to which items are related to each other
What is Split Half method
Involves correlating one half of a test with the other half
What is the Spearman Brown Correction
Allows the estimation of the reliability of the whole test from a correlation of the 2 half-tests (obtained from split-half method)
What is Kuder-Richardson Method
KR-20 and KR-21
Used for dichotomous items (true/false, right/wrong)
KR-20 gives a coefficient for any test which equal to the average of all possible split-half coefficients
What is Cronbach’s Alpha
Coefficient Alpha (rα) is a more general model of KR‐20 that does not require dichotomously scored items (e.g., agree/not sure/disagree)
When items are scored dichotomously, rα = rKR20
The most popular coefficient for reporting internal consistency
Error Sources for Internal Consistency
Addresses unreliability due to content sampling
Subject to changes in test administration, test-taker variables, scoring + interpretation
Limitations of Internal Consistency
A test developed to have high internal consistency by having items with highly similar content →
Content sampling may be so constricted as to be trivial
- Inappropriate for some speed tests, such as tests of clerical speed or reading rate.
Interpreting Reliability Coefficients
.90’s = high reliability
.80’s = moderate to high reliability
.70’s = low to moderate reliability
.60’s = unacceptably low reliability
How to improve reliability
- Increase no of items
- Discard low reliability items
- Estimate correlation without measurement error
Standard Error of Measurement vs Reliability Coefficient
Reliability coefficient is used to make judgements about the overall value of a particular test
SEM is used to make judgements about individual scores obtained with the test
Does the individual’s observed score on a test provide a good indication of their true score?
CTT = OS = TS + E
SEM used to see how big or small the E is
What is Standard Error of Measurement
The SEM indicates the precision of our estimate of an individual’s true score.
Assuming Normal Distribution SEM = SD
Lower SEM = Higher Reliability
What is the % score threshold in relation to SEM & SD
68% of scores fall under 1 SEM/SD
95% of scores fall under 1.96 SEM/SD
99% of scores fall under 2.58 SEM/SD