Chapter 5 Flashcards

1
Q

Refers to consistency in measurement.

Not an all-or-none matter.

A test may be reliable
in one context and unreliable in another.

A

RELIABILITY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Index of reliability, a proportion
that indicates the ratio between the true score variance on a
test and the total variance.

A

Reliability coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Score on an ability test is presumed to reflect
not only the testtaker’s true score on the ability being measured but also error.

X= T + E

A

Classical test theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A statistic useful in describing sources of test score variability

A

Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

2 types of measurement error

A

Systematic
Random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Source of error in measuring a targeted variable
caused by unpredictable fluctuations and inconsistencies of other
variables in the measurement process.

A

Random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Source of error in measuring a variable that
is typically constant or proportionate to what is presumed to be the true
value of the variable being measured.

Doesn’t affect score consistency

A

Systematic error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Source of variance which refers to variation among items
within a test as well as to variation among items between tests.

Differences are sure to be found in the way the items are worded
and in the exact content sampled.

A

Item sampling
or Content sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The testtaker’s reactions to those influences are the source of
one kind of error variance:

Sources of error in test administration

A

Test environment - physical setting where a test is administered

Testtaker variables - factors related to the individual taking the test

Examiner-related variables - factors related to the person administering the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

2 possible sources of error variance under test scoring and interpretation

A

Scorers - subjectivity involved
Scoring systems - glitch if computer scoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

3 potential sources of nonsystematic error in assessment
situation

A

Forgetting
Failing to notice abusive
behavior
Misunderstanding instructions regarding
reporting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Estimate of reliability obtained by
correlating pairs of scores from the same people on two different
administrations of the same test.

A

Test-retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The
estimate of test-retest reliability when the interval between testing is greater than six months

A

Coefficient
of stability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The degree of the relationship between various forms of a test can be
evaluated by means of _______________; often referred to as coefficient of equivalence.

A

Alternate-forms or parallel-forms
coefficient of reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Exist when, for each form of the test, the
means and the variances of observed test scores are equal.

A

Parallel forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Simply different versions of a test that have
been constructed so as to be parallel.

A

Alternate forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Obtained by correlating two pairs of scores obtained from
equivalent halves of a single test administered once.

A

Internal consistency estimate of reliability

or

Estimate of interitem consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Step 1. Divide the test into equivalent halves.

Step 2. Calculate a Pearson r between scores on the two halves
of the test.

Step 3. Adjust the half-test reliability using the Spearman-Brown formula

A

3 steps in the computation of coefficient of split-half reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

3 acceptable ways to split a test

A

Randomly assign
Odd-even reliability
By content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Assign odd-numbered
items to one half of the test and even-numbered items to the
other half

A

Odd-even reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Each half contains items equivalent with respect to content and
difficulty

A

By content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Randomly assign items
to one or the other half of the test.

A

Randomly assign

23
Q

Formula that allows a test developer or user to
estimate internal consistency reliability from a correlation of two halves of
a test.

A

Spearman-Brown

24
Q

Degree of correlation among all
the items on a scale.

A

Inter-item consistency

25
Degree to which a test measures a single factor.
Homogeneity
26
Degree to which a test measures different factors.
Heterogeneity
27
Statistic of choice for determining the inter-item consistency of dichotomous items, primarily those items that can be scored right or wrong
KuderRichardson formula 20 or KR-20
28
May be thought of as the mean of all possible split-half correlations, corrected by the Spearman-Brown formula. Typically ranges in value from 0 to 1.
COEFFICIENT ALPHA
29
A measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores.
AVERAGE PROPORTIONAL DISTANCE (APD)
30
Step 1: Calculate the absolute difference between scores for all of the items. Step 2: Average the difference between scores. Step 3: Obtain the APD by dividing the average difference between scores by the number of response options on the test, minus one
Steps in getting AVERAGE PROPORTIONAL DISTANCE (APD)
31
Degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure.
Inter-scorer reliability
32
1. the test items are homogeneous or heterogeneous in nature 2. the characteristic, ability, or trait being measured is presumed to be dynamic or static 3. the range of test scores is or is not restricted 4. the test is a speed or a power test 5. the test is or is not criterion-referenced
5 natures of test
33
Homogeneity vs heterogeneity of a test
Homogeneity of test items - all the items on a test are very similar in content, difficulty, and format, essentially measuring the same construct Heterogeneity of test items - items vary significantly in content, difficulty, and format, potentially assessing multiple related but distinct aspects of a concept
34
Trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences.
Dynamic characteristic
35
Trait, state, or ability presumed to be relatively unchanging such as intelligence.
Static characteristic
36
When a time limit is long enough to allow testtakers to attempt all items, and if some items are so difficult that no testtaker is able to obtain a perfect score
Power test
37
Generally contains items of uniform level of difficulty (typically uniformly low) so that, when given generous time limits, all testtakers should be able to complete all the test items correctly
Speed test
38
A reliability estimate of a speed test should be based on performance from two independent testing periods using one of the following: ___________ from two separately timed half tests
Test-retest reliability Alternate-forms reliability Split-half reliability
39
Designed to provide an indication of where a testtaker stands with respect to some variable or criterion, such as an educational or a vocational objective.
Criterion-referenced test
40
Any theoretical and statistical framework describing how respondents generate their answers to items on a scale or instrument and explaining associated sources of error
PSYCHOMETRIC MODELS
41
3 major approaches of psychometric models:
Classical test theory Generalizability theory Item response theory
42
A framework of principles and assumptions about how to determine the reliability of a set of data.
GENERALIZABILITY THEORY
43
Include things like the number of items in the test, the amount of training the test scorers have had, and the purpose of the test administration.
Facets
44
Examines how generalizable scores from a particular test are if the test is administered in different situations
Generalizability study
45
Represents the influence of particular facets on the test score, which is similar to reliability coefficient in the true score model.
Coefficients of generalizability
46
Involves the application of information from the generalizability study Developers examine the usefulness of test scores in helping the test user make decisions.
Decision study
47
States that the probability that an item will be answered correctly is a function of an underlying trait or ability that is not directly observable; that is, a latent trait; also referred to as latent-trait theory
ITEM RESPONSE THEORY
48
Test items or questions that can be answered with only one of two alternative responses, such as true– false, yes–no, or correct–incorrect questions
Dichotomous test items
49
Test items or questions with three or more alternative responses, where only one is scored correct or scored as being consistent with a targeted trait or other construct
Polytomous test items
50
Provides a measure of the precision of an observed test score
Standard error of measurement
51
Determine whether a score is significantly different from a criterion
Confidence interval
52
Statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant.
Standard error of the difference
53
4 sources of error variance
Test construction Test administration Test scoring and interpretation Other sources of error
54
2 other sources of error variance
Sampling error - not an actual representative of a population of data Methodological error - mistake or flaw in the design or execution of a research study