Unit 5: Reliability Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

measurement processes that alter what is measured

A

Carryover effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Portion of variability in test scores that is due to factors unrelated to the construct being measured.

A

error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

allows us to estimate, with a specific level of confidence, the range in which the true score is likely to exist

A

Standard Error of Measurement (SEM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

the degree to which a measure predictably overestimates or underestimates a quantity

refers to the degree to which systematic error influences the measurement

A

bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

an estimate of the extent to which these different forms of the same test have been affected by item sampling error, or other error

A

alternate forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

refers to the portion of variability in tests scores that reflects the actual differences in the trait, ability, characteristics the test is designed to measure

A

true variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

An estimate of test-retest reliability may be most appropriate in gauging the reliability of tests that employ outcome measures such as r___

A

reaction time or perceptual judgments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If interested in looking in the truth independent of measurement, psychologists look for the

A

construct score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

a person’s standing on a theoretical variable independent of any particular measurement

A

construct score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In ability tests, __ are carryover effects in which the test itself provides an opportunity to learn and practice the ability being measured

A

practice effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

a statistic that quantifies reliability, ranging from 0 (not at all reliable) to 1 (perfectly reliable)

A

reliability coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

It provides an estimate of the amount of error inherent in an observed score or measurement

A

Standard Error of Measurement (SEM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation

A

generalizability theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

to evaluate the relationship between different forms of a measure

A

alternate forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

an estimate of the reliability of a test can be obtained without developing an alternate form of the test and without having to administer the test twice to the same people

A

Internal consistency estimate of reliability or estimate of inter-item consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

provides a measure of the precision of an observed test score

A

Standard Error of Measurement (SEM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

if the test is __ in items, an estimate of internal consistency might be low relative to a more appropriate estimate of test-retest reliability

A

heterogeneous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

terms that refer to variation among items within a test as well as to variation among items between tests

A

item sampling or content sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If two scores each contain error such that in each
case the true score could be higher or lower, then we would want the two scores to be further apart before we conclude that there is a significant difference between them

A

standard error of difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

designed to provide an indication of where a testtaker stands with respect to some variable or criterion, such as an educational or a vocational objective

A

criterion-referenced test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The tool used to estimate or infer the extent to which an observed score deviates from a true score

A

Standard Error of Measurement (SEM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Scores on criterion-referenced tests tend to be interpreted in -

A

pass–fail (or, perhaps more accurately, “master–failed-to-master”) terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Universe is described in terms of its facets, which include considerations such as —

A

the number of items in the test,
the amount of training the test scorers have had, and
the purpose of the test administration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In general, a primary objective in splitting a test in half for the purpose of obtaining a split-half reliability estimate is to create what might be called __

A

mini- parallel-forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

nature of the test

A

(1) The test items are homogeneous or heterogeneous in nature;

(2) The characteristic, ability, or trait being measured is presumed to be dynamic or static;

(3) The range of test scores is or is not restricted;

(4) The test is a speed or a power test; and

(5) The test is or is not criterion-referenced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Refers to the degree of correlation among all the
items on a scale

A

inter-item consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

often used when coding nonverbal behavior

A

inter-scorer reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

It is a specific application of a more general formula to estimate the reliability of a test that is lengthened or shortened by any number of item

A

Spearman-Brown formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

The extent to which a testtaker’s score is affected by the content sampled on a test and by the way the content is sampled (i.e., the way in which the item is constructed) is a source of error variance

A

test construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

A reliability estimate of a __ test should be based on performance from two independent testing periods

A

speed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

purpose: to evaluate the stability of a measure

A

test-retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

In many tests, the advent of computer scoring and a growing reliance on objective, computer-scorable items have virtually eliminated error variance caused by scorer difference

A

test scoring and interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

it accurately measures internal consistency under highly specific conditions that are rarely met in real measures

A

cronbach’s alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

The relationship between the SEM and the reliability of a test is _

A

inverse

the higher the reliability of a test (or individual subtest within a test), the lower the SEM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

its use is typically to evaluate the homogeneity of a measure (or, all items are tapping in a single construct)

A

internal consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

exist when, for each form of the test, the means and the variances of observed test scores are equal

A

parallel forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

By determining the reliability of one half of a test, a test developer can use the __ to estimate the reliability of a whole test

A

Spearman-Brown formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Computation of a coefficient of split-half reliability:

A

Step 1: Divide the test into equivalent halves.

Step 2: Calculate a Pearson r between scores on
the two halves of the test

Step 3: Adjust the half-test reliability using the
Spearman–Brown formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Tests designed to measure one factor, such as
one ability or one trait,

A

homogeneous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

If the variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be __

A

lower

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Valid tests give scores that closely approximate __ scores

A

construct

42
Q

allows a test developer or user to estimate internal consistency reliability from a correlation between two halves of a test

A

Spearman-Brown formula

43
Q

two types of variance

A

True variance: variance from true differences

Error variance: variance from irrelevant, random sources

44
Q

most frequently used measure of internal consistency, but has several well- known limitations

A

cronbach’s alpha

45
Q

seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score

A

domain sampling theory

46
Q

If test developers or users wish to shorten a test, the __ may be used to estimate the effect of the shortening on the test’s reliability

A

Spearman–Brown formula

47
Q

refers to the proportion of the total variance attributed to true variance

A

reliability

48
Q

an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means and variances of observed test scores are equal

A

parallel forms reliability

49
Q

test items or questions that can be answered with only one of two alternative responses, such as true–false, yes– no, or correct–incorrect questions

A

dichotomous test items

50
Q

a reference to an IRT model with specific assumptions about the underlying distribution

A

rasch model

51
Q

can be used to set the confidence interval for a particular score or to determine whether a score is significantly different from a criterion (such as the cutoff score of 70 described previously)

A

Standard Error of Measurement (SEM)

52
Q

__ are carryover effects in which repeated testing reduces overall mental energy or motivation to perform on a test

A

Fatigue effects

53
Q

consists of unpredictable fluctuations and inconsistencies of other variables in the measurement process;
sometimes referred to as “noise”;

A

random error

54
Q

the goal of psychological assessment is to

A

maximize true variance

minimize error variance

55
Q

statistical procedure of test-retest

A

pearson r or spearman rho

56
Q

trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences

A

dynamic characteristic

57
Q

The influence of particular facets on the test score is represented by _

A

coefficients of generalizability

58
Q

approach to reliability evaluation

A

test-retest method

59
Q

trait, state, or ability presumed to be relatively unchanging, such as intelligence

A

static characteristic

60
Q

The total variance in an observed distribution of test scores (σ2)

A

equals the sum of the TRUE variance ( 𝜎2) and the ERROR variance ( 𝜎2)

61
Q

Given the exact same conditions of all the facets in the universe, the exact same test score should be obtained
▪ This test score is the __, and it is, as Cronbach noted, analogous to a true score in the true score model

A

universe score

62
Q

an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test

A

test-retest reliability

63
Q

A __ of behavior, or the universe of items that could conceivably measure that behavior, can be thought of as a hypothetical construct

A

domain

64
Q

obtaining estimates of alternate-forms reliability and parallel-forms reliability is similar in two ways to obtaining an estimate of test-retest reliability:

A

(1) Two test administrations with the same group are
required

(2) Test scores may be affected by factors such as
motivation, fatigue, or intervening events such as practice, learning, or therapy

65
Q

the degree of the relationship between various forms of a test

A

Coefficient of equivalence

66
Q

generally contains items of uniform level of difficulty (typically uniformly low) so that, when given generous time limits, all testtakers should be able to complete all the test items correctly

A

speed test

67
Q

Could also be used to determine the number of items needed to attain a desired level of reliability that has the desired reliability

A

Spearman–Brown formula

68
Q

Widely used as a measure of reliability, in part because it requires only one administration of the test

A

coefficient alpha

69
Q

nternal consistency estimates of reliability, such as that obtained by use of the Spearman– Brown formula, are inappropriate for measuring the reliability of

A

heterogenous and speed tests

70
Q

If the variance of either variable in a correlational analysis is inflated by the sampling procedure, then the resulting correlation coefficient tends to be __

A

higher

71
Q

A statistic useful in describing sources of test score variability is the

A

variance

72
Q

This source of error fluctuates from one testing situation to another with no discernible pattern that would systematically raise or lower scores; increase or decrease test scores unpredictably

A

random error

73
Q

to evaluate the extent to which items on a scale relate to one another

A

internal consistency

74
Q

sources of error variance of alternate forms

A

test construction or administration

75
Q

simplest way of determining the degree of consistency among scorers in the scoring of a test

A

coefficient of inter-scorer reliability

76
Q

they influence test scores in a consistent direction; either consistently inflate scores or consistently deflate scores

A

systematic error

77
Q

test items or questions with three or more alternative responses, where only one is scored correct or scored as being consistent with a targeted trait or other construct

A

polytomous test items

78
Q

use: when assessing the stability of various personality traits

A

test-retest reliability

79
Q

obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once

A

split-half reliability

80
Q

When measuring something repeatedly, two influences interfere with accurate measurement:

A

(1) Time elapses between measurements

(2) The act of measurement can alter what is being estimated

81
Q

when a time limit is long enough to allow testtakers to attempt all items, and if some items are so difficult that no testtaker is able to obtain a perfect score

A

power test

82
Q

an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test

A

test-retest reliability

83
Q

A statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant

A

standard error of difference

84
Q

calculated to help answer questions about how similar sets of data are

A

coefficient alpha

85
Q

A test is said to be __ in items if it is functionally uniform throughout

A

homogeneous

86
Q

signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured

A

discrimination

87
Q

sources of error variance of test-retest

A

administration

88
Q

a value that according to CTT genuinely reflects an individual’s ability (or trait) level as measured by a particular test

A

true score

89
Q

It is a useful measure of reliability when it is impractical or undesirable to assess reliability with two tests or to administer a test twice (because of factors such as time or expense)

A

split-half reliability

90
Q

A range or band of test scores that is likely to contain the true score

A

confidence interval

91
Q

tied to the measurement instrument used

A

true score

92
Q

the inherent uncertainty associated with any measurement, even after care has been taken to minimize preventable mistake

A

measurement of error

93
Q

in homogenous test, it is reasonable to expect a high degree of

A

internal consistency

94
Q

The procedures of this provide a way to model the probability that a person with X ability will be able to perform at a level of Y

A

Item Response Theory (IRT)

95
Q

typically designed to be equivalent with respect to variables such as content and level of difficulty

A

alternate forms

96
Q

refers both to preventable mistakes and to aspects of measurement imprecision that are inevitable

A

error

97
Q

Reliable tests give scores that closely approximate __ scores

A

true score

98
Q

The degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure

A

Inter-scorer reliability

99
Q

Of the three types of estimates of reliability, __ are perhaps the most compatible with domain sampling theory

A

measures of internal consistency

100
Q

Focuses on the degree of difference that exist between item scores

A

average proportion distance