Ch. 5 - Reliability Flashcards

1
Q

alternate forms

A

different versions of the same test or measure;

contrast with parallel forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

alternate-forms reliability

A

estimate to the extent to which item sampling and other errors have affected scores on two versions of the same test;

contrast with parallel-forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

bias

A

a factor inherent within a test that systematically prevents accurate, impartial measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

classical test theory (CTT)

A

aka ‘true score theory / model’ …

system of assumptions about measurement that includes the notion that a test score (and even a response to an individual item) is composed of a relatively stable component that actually is what the test or individual item is designed to measure, as well as a component that is error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

coefficient alpha

A

aka ‘Cronbach’s alpha’ and alpha…

a statistic widely employed in test construction and used to assist in deriving an estimate of reliability

more technically, equal to mean of all split-half reliabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

coefficient of equivalence

A

an estimate of parallel-forms reliability or alternate-forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

coefficient of generalizability

A

index of the influence that particular facets have on a test score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

coefficient of inter-scorer reliability

A

determines the degree of consistency among scorers in the scoring of a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

coefficient of stability

A

estimate of test-retest reliability obtained during time intervals of six months or longer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

confidence interval

A

range or band of test scores that is likely to contain the “true score”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

content sampling

A

variety of the subject matter contained in the items;

aka item sampling, in context of variation between individual test items in a test or between test items in two or more tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

criterion-referenced test

A

aka ‘domain-referenced testing’ and ‘content-referenced testing’

method of evaluation and a way of deriving meaning from test scores by evaluating an individual’s score with reference to a set standard (or criterion)

contrast with norm-referenced testing and assessment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

decision study

A

conducted at the conclusion of a generalizability study, this research is designed to explore the utility and value of test scores in making decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

dichotomous test item

A

test item or question that can be answered with only one of two response options (true/false, yes/no)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

discrimination

A

in IRT, degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured by a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

domain sampling theory

A

a system of assumptions about measurement that includes the notion that a test score (and even response to an individual item) consists of a relatively stable component that actually is what the test or individual item is designed to measure as well as relatively unstable components that collectively can be accounted for as error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

dynamic characteristic

A

a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences

contrast with static characteristic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

error variance

A

in true score model…

component of variance attributable to random sources irrelevant to the trait or ability the test purports to measure in an observed score or distribution of scores

common sources of error variance include those related to test construction (including item or content sampling), test administration, and test scoring and intrepration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

estimate of inter-item consistency

A

an estimate of the reliability of a test obtained from a measure of inter-item consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

facet

A

in generalizability theory…

variables of interest in the universe including number of items in the test, amount of training the test scorers have had, purpose of the test administration, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

generalizability theory

A

aka domain sampling theory

system of assumptions about measurement that includes the notion that a test score (and response) consists of a relatively stable component that actually is what the test or individual item is designed to measure as well as relatively unstable components that collectively can be accounted for as error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

generalizability study

A

in context of generalizability theory…

research conducted to explore the impact of different facets of the universe on a test score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

heterogeneity

A

more generally, having diverse contents

heterogeneous test measures multiple factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

homogeneity

A

describes degree to which a test measures a single trait

25
inflation of range/variance
a reference to a phenomenon associated with reliability estimates wherein the variance of either variable in a correlational analysis is inflated by the sampling procedure used and so the resulting correlation coefficient tends to be higher contrast with restriction of range
26
information function
27
inter-item consistency
the consistency or homogeneity of the items of a test, estimated by techniques such as the split-half method
28
internal consistency estimate of reliability
an estimate of the reliability of a test obtained from a measure of inter-item consistency
29
inter-scorer reliability
aka inter-rater reliability, observer reliability, judge reliability, and scorer reliability an estimate of the degree of agreement of consistency between two and more scorers (or judges, raters, observers)
30
item response theory (IRT)
aka latent-trait theory / model system of assumptions about measurement (including assumption that a trait being measured by a test is unidimensional) and the extent to which each test item measures the trait
31
item sampling
aka content sampling variety of the subject matter contained in the items freq ref to in context of the variation between individual test items in a test or between test items in two or more tests
32
latent-trait theory
aka latent-trait model system of assumptions about measurement, including the assumption that a trait being measured by a test is unidimensional, and the extent to which each test item measures the trait
33
measurement error
refers to the inherent uncertainty associated with any measurement, even after care has been taken to minimize preventable mistakes
34
odd-even reliability
estimate of split-half reliability of a test, obtained by assigning odd-numbered items to one half of the test and even-numbered items to the other half
35
parallel forms
two or more versions of forms of the same test where, for each form, the means and variances of observed test scores are equal contrast with alternate forms
36
parallel-forms reliability
as estimate of the extent to which item sampling and other errors have affected test scores on two versions of the same test when, for each form of the test, the means and variances of observed test scores are equal contrast with alternate-forms reliability
37
polytomous test item
a test item or question with three or more alternative responses, where only one alternative is scored correct or scored as being consistent with a targeted trait or other construct
38
power test
a test, usually of achievement or ability, which 1) either no time limit or such a long time limit that all test-takers can attempt all items and 2) some items so difficult that no test-taker can obtain a perfect score contrast with speed test
39
random error
a source of error in measuring a targeted variable, caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process contrast with systematic error
40
Rasch model
reference to an IRT model with very specific assumptions about the underlying distribution
41
reliability
the extent to which measurements are consistent or repeatable also extent to which measurements differ from occasion to occasion as a function of measurement error
42
reliability coefficient
general term for an index of reliability or the ratio of true score variance on a test to the total variance
43
replicability crisis
low replication rates commonly found in psychological research
44
restriction of range/variance
aka restriction of variance phenomenon associated with reliability estimates wherein the variance of either variable in a correlational analysis is restricted by the sampling procedure used and so the resulting correlation coefficient tends to be lower contrast with inflation of range
45
Spearman-Brown formula
equation used to estimate internal consistency reliability from a correlation of two halves of a test that has been lengthened or shortened inappropriate for use with heterogeneous tests or speed tests
46
speed test
test usually of achievement or ability, with a time limit speed tests usually contain items of uniform difficulty level
47
split-half reliability
estimate of the internal consistency of a test obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once
48
standard error of a score
in true score theory, aka SEM a statistic designed to estimate the extent to which an observed score deviates from a true score
49
standard error of measurement
(SEM, aka std err of score) in true score theory a statistic designed to estimate the extent to which an observed score deviates from a true score
50
standard error of the difference
a statistic designed to aid in determining how large a difference between two scores should be before it is considered statistically significant
51
static characteristic
a trait, state, or ability presumed to be relatively unchanging over time contrast with dynamic characteristic
52
systematic error
a source of error in measuring a variable that is typically constant and proportionate to what is presumed to be the true value of the variable being measured contrast with random error
53
test-retest reliability
estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test
54
transient error
source of error attributable to variations in the test-takers feelings, moods, or mental state over time
55
true score
a value that, according to classical test theory, genuinely reflects an individual's ability (or trait) level as measured by a particular test
56
true variance
in the true score model component of variance attributable to true differences in the ability or trait bring measured that are inherent in an observed score of distribution of scores
57
universe
in generalizability theory the total context of a particular test situation, including all the factors that lead to an individual's test-taker's score
58
universe score
in generalizability theory a test score corresponding to the particular universe being assessed or evaluated
59
variance
a measurement of variability equal to the arithmetic mean of the squares of the differences between the scores in a distribution and their mean