Lecture 4: Essential of Reliability Flashcards
Reliability
- suggest trustworthiness
- quality of test that suggest they are sufficiently consistent and free from measurement error
- consistency and precision of the results of the measurement process
Measurement error
any fluctuation in scores that results from factors related to the measurement process that are irrelevant to what is being measured
- reliable scores should be free of measurement error
treu score
- hypothetical entities that would result from error-free measurement
- goal of reliability analysis: to estimate true scores
Individual’s true score
the average score in a hypothetical distribution of scores that would be obtained if the individual took the same test an infinite number of times
observed score
derived from tests (= scores that the individuals actually obtain)
any observed score (X0) is made up of two components
- the true score component
- the error score component
True score component (Xtrue)
is construed to be that portion of observed score that reflects whatever ability, trait, or characteristic the test assesses
error score component (Xerror)
difference between the observed score and the true score
- any other factor that may enter into the observed score as a conseqeunce of the measurement process
sample variance (true scores in group data)
the average amount of variability in a group of scores
sample variance consist of (two components)
- a portion that is true variance
- a portion that us error variance
True variance
differences among the scores of individuals within a group that reflect their standing or position in whatever characteristic the test assesses
error variance
differences among test scores that reflect factors irrelevant to what the test assesses
- reliability scores increases as the error component decreases
Reliability coefficient (reliability)
defined as the ration of true score variance to total test score variance
- if test score variance = true variance (reliability = 1)
Two-step process (Evaluation of reliability)
- What are possible sources of error?
2. What is the magnitude of those errors?
The relativity of reliability
- tests cannot be reliable, test scores are reliable!!
- score might be unreliable (due to test taker, testing situation)
3 sources of error which can enter the test score
- Context in which tésting takes place
- test taker
- specific characteristics of the test itself
Random measurement error vs. systematic measurement error
- some of the errors can be minimized (due to proper testing practice etc.)
- other cannot be eliminated but may be detected by variozs types of checks built into the test
Sources of error
- Interscorer difference
- Time sampling error
- Content sampling error
- Interim inconsistency
- Interim inconsistency and content homogeneity
- Time and content sampling error
Interscorer difference
- errors entering into scores whenever the element of subjectivity influences scoring
- refers to the variations in scores that stem from differences in the subjective judgements of the scorers
Scorer Reliability
- method for estimating error due to interscorer differences
- 2 independent scorers (two independent scores are generated)
- correlation between the set of scores
(for metric variables)
Time sampling error
variability in test scores as a function of the fact that they are obtained at one point in time rather than at another
Concept of time sampling error
- hinges on two related notion
- Construct/behavior is liable to fluctuate in time
2. Construct/behaviors chnage at different paces in time
Test-retest reliability
- test is administered twice on two different occasions to one or more groups of individuals
- correlation between the scores obtained from the two administrations
= test-retest reliability coefficient - crucial: length of time interval!
Content sampling error
term used to label the trait-irrelevant variability that can enter into test scores as a result for fortuitous factors related to the content of the specific items included in a test
Content sampling error can be due to..
- faulty test constructions
2. specific content which favors some test takers
alternate-form reliability
- intended to estimate the amount of error in test scores that is attributable to content sampling error
- two or more forms of a test (different in speciifc content) need to be prepared and administered to the same group of subjects
- scores are correlated (alternate-form reliability)
Split-half reliability
- administering a test to a group of individuals and create two scores for each person by splitting the test into two halves
- the scores of the two halves are then correlated (split-half reliability coefficient)
Spearman-Brown (S-B) formula
- based on the notion of all things being equal, ascore based on a longer test will be closer to the true test score than one based on a shorter test
- the formula estimates the effect
Spearman Brown formula (does what?)
the formula estimates the effect: that lengthening a test by any amount, or shortening a test to any fraction of its original size, will have on the obtained coefficient
Some solutions to the Problem of How to split a test in halves …
- odd even split, or two halves
2. for speed tests: two-trial reliability
Interim inconsistency
error in scores that results from fluctuations in items across an entire test (low correlations among test items)
What is interim inconsistency due?
- content sampling
2. Content heterogeneity
Content heterogeneity
inclusion of items or sets of items that tap content knowledge or psychological functions that differ from those tapped by other items in the same test
(only when the test should be homogenous)
Internal consistency measures
statistical procedures designed to assess the extent of inconsistency across test items
(split-half reliability coefficients accomplish this to some extent)
- formulas that take into account the interim correlation
interim correlation
the correlation between performance on all the items within a test
Kudar Richardson formular (KR20) and coefficient alpha (cronbachs alpha)
- function of two factors
- number of items in the test
- the ratio of variability in test taker*s performance across all the items in the test to total test score variance
most frquently used formulars to calculate interim consistency
Kudar Richardson formular (KR20) coefficient alpha (cronbachs alpha)
Kudar- Richardson formula
Applied to test whose items are scored as right or wrong (dichotomous)
- dependent on the interim variability within a test
Coefficient alpha (cronbach’s alpha)
used for tests whose items have multiple possible responses
- dependent on the interim variability within a test
Hetero vs. Homogeneity (Used in reference to the composition of …)
- the behavior samples (items) of a test
2. group of test takers
Time sampling an content sampling error combined
both can be estimated in a combined fashion for tests which require stability and consistency as results
Delayed Alternate-Form Reliability
these coefficients can be calculated when two or more alternate forms of the same test are administered on two different occasions
- addtional source of error: practise effects!