Reliability Flashcards

1
Q

Different versions of the same test or measure; contrast with parallel forms

A

Alternate forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A statistic widely employed in test construction and used to assist in deriving an estimate of reliability; more technically, it is equal to the mean of all split-half reliabilities

A

Coefficient alpha or Cronbach’s alpha and alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

An estimate of parallel-forms reliability or alternate-forms reliability

A

Coefficient of equivalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In generalizability theory, an index of the influence that particular facets have on a test score

A

Coefficient of generalizability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

An estimate of test-retest reliability obtained during time intervals of six months or longer

A

Coefficient of stability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

A range or band of test scores that is likely to contain the “true score”

A

Confidence interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The variety of the subject matter contained in the items; frequently referred to in the context of the variation between individual test items in a test or between test items in two or more tests

A

Content sampling or item sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In the true score model, the component of variance attributable to random sources irrelevant to the trait or ability the test purports to measure in an observed score or distribution of scores. Common sources include those related to test construction (including item or content sampling), test administration, and test scoring and interpretation

A

Error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Also referred to as domain sampling theory, a system of assumptions about measurement that includes the notion that a test score, and even a response to an individual item, is composed of a relatively stable component that actually is what the test or individual item is designed to measure, and relatively unstable components that collectively can be accounted for as error

A

Generalizability theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

A reference to a phenomenon associated with reliability estimates wherein the variance of either variable in a correlational analysis is inflated by the sampling procedure used and the resulting correlation coefficient tends to be higher as a consequence; contrast with restriction of range

A

Inflation of range or inflation of variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

An estimate of how consistently the items of a test measure a single construct obtained from a single administration of a single form of the test and the measurement of the degree of correlation among all of the test items

A

Internal consistency or inter-item consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

An estimate of the degree of agreement or consistency between two or more scorers (or judges or raters or observers)

A

Inter-scorer reliability or inter-rater reliability, observer reliability, judge reliability, and scorer reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

A system of assumptions about measurement, including the assumption that a trait being measured by a test is uni-dimensional, and the extent to which each test item measures the trait

A

Item response theory (IRT) or latent-trait theory or the latent-trait model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The variety of the subject matter contained in the items; frequently referred to in the context of the variation between individual test items in a test or between test items in two or more tests

A

Item sampling or content sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A measure of inter-scorer reliability originally designed for use when scorers make ratings using nominal scales of measurement

A

Kappa statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A series of equations developed by G. F. Kuder and M. W. Richardson designed to estimate the inter-item consistency of tests

A

Kuder-Richardson formulas

17
Q

An estimate of split-half reliability of a test, obtained by assigning odd-numbered items to one-half of the test and even-numbered items to the other half

A

Odd-even reliability

18
Q

Two or more versions or forms of the same test when for each form, the means and variances of observed test scores are equal; contrast with alternate forms

A

Parallel forms

19
Q

A test, usually of achievement or ability, with (1) either no time limit or such a long time limit that all test takers can attempt all items, and (2) some items so difficult that no test taker can obtain a perfect score; contrast with speed test

A

Power test

20
Q

The extent to which measurements are consistent or repeatable; also, the extent to which measurements differ from occasion to occasion as a function of measurement error

A

Reliability

21
Q

General term for an index of reliability or the ratio of true score variance on a test to the total variance

A

Reliability coefficient

22
Q

A phenomenon associated with reliability estimates wherein the variance of either variable in a correlational analysis is restricted by the sampling procedure used, and the resulting correlation coefficient tends to be lower as a consequence; contrast with inflation of range

A

Restriction of range or restriction of variance

23
Q

Now outdated, an equation once used to estimate internal consistency reliability

A

Rulon formula

24
Q

An equation used to estimate internal consistency reliability from a correlation of two halves of a test that has been lengthened or shortened; inappropriate for use with heterogeneous tests or speed tests

A

Spearman-Brown formula

25
Q

A test, usually of achievement or ability, with a time limit; speed tests usually contain items of uniform difficulty level

A

Speed test

26
Q

An estimate of the internal consistency of a test obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once

A

Split-half reliability

27
Q

In true score theory, a statistic designed to estimate the extent to which an observed score deviates from a true score

A

Standard error of measurement or standard error of a score

28
Q

A statistic designed to aid in determining how large a difference between two scores should be before it is considered statistically significant

A

Standard error of the difference

29
Q

The extent to which individual test items do not measure a single construct but instead measure different factors; contrast with test homogeneity

A

Test heterogeneity

30
Q

The extent to which individual test items measure a single construct; contrast with test heterogeneity

A

Test homogeneity

31
Q

An estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test

A

Test-retest reliability

32
Q

In the true score model, the component of variance attributable to true differences in the ability or trait being measured, inherent in an observed score or distribution of scores

A

True variance

33
Q

A measure of variability equal to the arithmetic mean of the squares of the differences between the scores in a distribution and their mean

A

Variance