- suggest trustworthiness - quality of test that suggest they are sufficiently consistent and free from measurement error - consistency and precision of the results of the measurement process

- hypothetical entities that would result from error-free measurement - goal of reliability analysis: to estimate true scores

Lecture 4: Essential of Reliability Flashcards by Nerija Sickel

Reliability

suggest trustworthiness
quality of test that suggest they are sufficiently consistent and free from measurement error
consistency and precision of the results of the measurement process

How well did you know this?

Not at all

Perfectly

Measurement error

any fluctuation in scores that results from factors related to the measurement process that are irrelevant to what is being measured
- reliable scores should be free of measurement error

How well did you know this?

Not at all

Perfectly

treu score

hypothetical entities that would result from error-free measurement
goal of reliability analysis: to estimate true scores

How well did you know this?

Not at all

Perfectly

Individual’s true score

the average score in a hypothetical distribution of scores that would be obtained if the individual took the same test an infinite number of times

How well did you know this?

Not at all

Perfectly

observed score

derived from tests (= scores that the individuals actually obtain)

How well did you know this?

Not at all

Perfectly

any observed score (X0) is made up of two components

the true score component

- the error score component

How well did you know this?

Not at all

Perfectly

True score component (Xtrue)

is construed to be that portion of observed score that reflects whatever ability, trait, or characteristic the test assesses

How well did you know this?

Not at all

Perfectly

error score component (Xerror)

difference between the observed score and the true score

- any other factor that may enter into the observed score as a conseqeunce of the measurement process

How well did you know this?

Not at all

Perfectly

sample variance 
(true scores in group data)

the average amount of variability in a group of scores

How well did you know this?

Not at all

Perfectly

sample variance consist of (two components)

a portion that is true variance

- a portion that us error variance

How well did you know this?

Not at all

Perfectly

True variance

differences among the scores of individuals within a group that reflect their standing or position in whatever characteristic the test assesses

How well did you know this?

Not at all

Perfectly

error variance

differences among test scores that reflect factors irrelevant to what the test assesses
- reliability scores increases as the error component decreases

How well did you know this?

Not at all

Perfectly

Reliability coefficient (reliability)

defined as the ration of true score variance to total test score variance
- if test score variance = true variance (reliability = 1)

How well did you know this?

Not at all

Perfectly

Two-step process (Evaluation of reliability)

What are possible sources of error?

2. What is the magnitude of those errors?

How well did you know this?

Not at all

Perfectly

The relativity of reliability

tests cannot be reliable, test scores are reliable!!

- score might be unreliable (due to test taker, testing situation)

How well did you know this?

Not at all

Perfectly

3 sources of error which can enter the test score

Context in which tésting takes place
test taker
specific characteristics of the test itself

How well did you know this?

Not at all

Perfectly

Random measurement error vs. systematic measurement error

Study These Flashcards

some of the errors can be minimized (due to proper testing practice etc.)
other cannot be eliminated but may be detected by variozs types of checks built into the test

Sources of error

Study These Flashcards

Interscorer difference
Time sampling error
Content sampling error
Interim inconsistency
Interim inconsistency and content homogeneity
Time and content sampling error

Interscorer difference

Study These Flashcards

errors entering into scores whenever the element of subjectivity influences scoring
refers to the variations in scores that stem from differences in the subjective judgements of the scorers

Scorer Reliability

Study These Flashcards

method for estimating error due to interscorer differences
2 independent scorers (two independent scores are generated)
correlation between the set of scores
(for metric variables)

Time sampling error

Study These Flashcards

variability in test scores as a function of the fact that they are obtained at one point in time rather than at another

Concept of time sampling error

- hinges on two related notion

Study These Flashcards

Construct/behavior is liable to fluctuate in time

2. Construct/behaviors chnage at different paces in time

Test-retest reliability

Study These Flashcards

test is administered twice on two different occasions to one or more groups of individuals
correlation between the scores obtained from the two administrations
= test-retest reliability coefficient
crucial: length of time interval!

Content sampling error

Study These Flashcards

term used to label the trait-irrelevant variability that can enter into test scores as a result for fortuitous factors related to the content of the specific items included in a test

Content sampling error can be due to..

1. faulty test constructions | 2. specific content which favors some test takers

alternate-form reliability

- intended to estimate the amount of error in test scores that is attributable to content sampling error - two or more forms of a test (different in speciifc content) need to be prepared and administered to the same group of subjects - scores are correlated (alternate-form reliability)

Split-half reliability

- administering a test to a group of individuals and create two scores for each person by splitting the test into two halves - the scores of the two halves are then correlated (split-half reliability coefficient)

Spearman-Brown (S-B) formula

- based on the notion of all things being equal, ascore based on a longer test will be closer to the true test score than one based on a shorter test - the formula estimates the effect

Spearman Brown formula (does what?)

the formula estimates the effect: that lengthening a test by any amount, or shortening a test to any fraction of its original size, will have on the obtained coefficient

Some solutions to the Problem of How to split a test in halves ...

1. odd even split, or two halves | 2. for speed tests: two-trial reliability

Interim inconsistency

error in scores that results from fluctuations in items across an entire test (low correlations among test items)

What is interim inconsistency due?

1. content sampling | 2. Content heterogeneity

Content heterogeneity

inclusion of items or sets of items that tap content knowledge or psychological functions that differ from those tapped by other items in the same test (only when the test should be homogenous)

Internal consistency measures

statistical procedures designed to assess the extent of inconsistency across test items (split-half reliability coefficients accomplish this to some extent) - formulas that take into account the interim correlation

interim correlation

the correlation between performance on all the items within a test

Kudar Richardson formular (KR20) and coefficient alpha (cronbachs alpha) - function of two factors

- number of items in the test | - the ratio of variability in test taker*s performance across all the items in the test to total test score variance

most frquently used formulars to calculate interim consistency

``` Kudar Richardson formular (KR20) coefficient alpha (cronbachs alpha) ```

Kudar- Richardson formula

Applied to test whose items are scored as right or wrong (dichotomous) - dependent on the interim variability within a test

Coefficient alpha (cronbach's alpha)

used for tests whose items have multiple possible responses | - dependent on the interim variability within a test

Hetero vs. Homogeneity (Used in reference to the composition of ...)

1. the behavior samples (items) of a test | 2. group of test takers

Time sampling an content sampling error combined

both can be estimated in a combined fashion for tests which require stability and consistency as results

Delayed Alternate-Form Reliability

these coefficients can be calculated when two or more alternate forms of the same test are administered on two different occasions - addtional source of error: practise effects!

Lecture 4: Essential of Reliability Flashcards

(42 cards)