Chapter 5 Flashcards

1
Q

Refers to consistency in measurement.

Not an all-or-none matter.

A test may be reliable
in one context and unreliable in another.

A

RELIABILITY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Index of reliability, a proportion
that indicates the ratio between the true score variance on a
test and the total variance.

A

Reliability coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Score on an ability test is presumed to reflect
not only the testtaker’s true score on the ability being measured but also error.

X= T + E

A

Classical test theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A statistic useful in describing sources of test score variability

A

Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

2 types of measurement error

A

Systematic
Random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Source of error in measuring a targeted variable
caused by unpredictable fluctuations and inconsistencies of other
variables in the measurement process.

A

Random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Source of error in measuring a variable that
is typically constant or proportionate to what is presumed to be the true
value of the variable being measured.

Doesn’t affect score consistency

A

Systematic error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Source of variance which refers to variation among items
within a test as well as to variation among items between tests.

Differences are sure to be found in the way the items are worded
and in the exact content sampled.

A

Item sampling
or Content sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The testtaker’s reactions to those influences are the source of
one kind of error variance:

Sources of error in test administration

A

Test environment - physical setting where a test is administered

Testtaker variables - factors related to the individual taking the test

Examiner-related variables - factors related to the person administering the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

2 possible sources of error variance under test scoring and interpretation

A

Scorers - subjectivity involved
Scoring systems - glitch if computer scoring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

3 potential sources of nonsystematic error in assessment
situation

A

Forgetting
Failing to notice abusive
behavior
Misunderstanding instructions regarding
reporting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Estimate of reliability obtained by
correlating pairs of scores from the same people on two different
administrations of the same test.

A

Test-retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The
estimate of test-retest reliability when the interval between testing is greater than six months

A

Coefficient
of stability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The degree of the relationship between various forms of a test can be
evaluated by means of _______________; often referred to as coefficient of equivalence.

A

Alternate-forms or parallel-forms
coefficient of reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Exist when, for each form of the test, the
means and the variances of observed test scores are equal.

A

Parallel forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Simply different versions of a test that have
been constructed so as to be parallel.

A

Alternate forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Obtained by correlating two pairs of scores obtained from
equivalent halves of a single test administered once.

A

Internal consistency estimate of reliability

or

Estimate of interitem consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Step 1. Divide the test into equivalent halves.

Step 2. Calculate a Pearson r between scores on the two halves
of the test.

Step 3. Adjust the half-test reliability using the Spearman-Brown formula

A

3 steps in the computation of coefficient of split-half reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

3 acceptable ways to split a test

A

Randomly assign
Odd-even reliability
By content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Assign odd-numbered
items to one half of the test and even-numbered items to the
other half

A

Odd-even reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Each half contains items equivalent with respect to content and
difficulty

A

By content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Randomly assign items
to one or the other half of the test.

A

Randomly assign

23
Q

Formula that allows a test developer or user to
estimate internal consistency reliability from a correlation of two halves of
a test.

A

Spearman-Brown

24
Q

Degree of correlation among all
the items on a scale.

A

Inter-item consistency

25
Q

Degree to which a test measures a
single factor.

A

Homogeneity

26
Q

Degree to which a test measures
different factors.

A

Heterogeneity

27
Q

Statistic of choice for determining the inter-item consistency
of dichotomous items, primarily those items that can be scored right or wrong

A

KuderRichardson formula 20 or KR-20

28
Q

May be thought of as the mean of all possible split-half
correlations, corrected by the Spearman-Brown formula.

Typically ranges in value from 0 to 1.

A

COEFFICIENT ALPHA

29
Q

A measure used to evaluate the internal consistency of a test that
focuses on the degree of difference that exists between item scores.

A

AVERAGE PROPORTIONAL DISTANCE (APD)

30
Q

Step 1: Calculate the absolute difference between scores for all of the
items.

Step 2: Average the difference between scores.

Step 3: Obtain the APD by dividing the average difference between
scores by the number of response options on the test, minus one

A

Steps in getting AVERAGE PROPORTIONAL DISTANCE (APD)

31
Q

Degree of agreement or
consistency between two or more scorers (or judges or raters)
with regard to a particular measure.

A

Inter-scorer reliability

32
Q
  1. the test items are homogeneous or heterogeneous in nature
  2. the characteristic, ability, or trait being measured is presumed to be
    dynamic or static
  3. the range of test scores is or is not restricted
  4. the test is a speed or a power test
  5. the test is or is not criterion-referenced
A

5 natures of test

33
Q

Homogeneity vs heterogeneity of a test

A

Homogeneity of test items - all the items on a test are very similar in content, difficulty, and format, essentially measuring the same construct

Heterogeneity of test items - items vary significantly in content, difficulty, and format, potentially assessing multiple related but distinct aspects of a concept

34
Q

Trait, state, or ability presumed to be
ever-changing as a function of situational and cognitive
experiences.

A

Dynamic characteristic

35
Q

Trait, state, or ability presumed to be
relatively unchanging such as intelligence.

A

Static characteristic

36
Q

When a time limit is long enough to allow testtakers to attempt all
items, and if some items are so difficult that no testtaker is able to
obtain a perfect score

A

Power test

37
Q

Generally contains items of uniform
level of difficulty (typically uniformly low) so that, when given
generous time limits, all testtakers should be able to complete all
the test items correctly

A

Speed test

38
Q

A reliability estimate of a speed test should be based on performance from
two independent testing periods using one of the following: ___________ from
two separately timed half tests

A

Test-retest reliability
Alternate-forms reliability
Split-half reliability

39
Q

Designed to provide an indication of where a
testtaker stands with respect to some variable or criterion, such as an educational
or a vocational objective.

A

Criterion-referenced test

40
Q

Any theoretical and statistical framework describing how
respondents generate their answers to items on a scale or
instrument and explaining associated sources of error

A

PSYCHOMETRIC MODELS

41
Q

3 major approaches of psychometric models:

A

Classical test theory
Generalizability theory
Item response theory

42
Q

A framework of principles and assumptions about how to
determine the reliability of a set of data.

A

GENERALIZABILITY THEORY

43
Q

Include things
like the number of items in the test, the amount of training the test
scorers have had, and the purpose of the test administration.

44
Q

Examines how generalizable scores from a
particular test are if the test is administered in different situations

A

Generalizability study

45
Q

Represents the influence of particular facets on the test score, which is similar to reliability coefficient in
the true score model.

A

Coefficients of generalizability

46
Q

Involves the application of information
from the generalizability study

Developers examine the usefulness of test scores in
helping the test user make decisions.

A

Decision study

47
Q

States that
the probability that an item will be answered correctly is a function
of an underlying trait or ability that is not directly observable; that is,
a latent trait; also referred to as latent-trait theory

A

ITEM RESPONSE THEORY

48
Q

Test items or questions that can be
answered with only one of two alternative responses, such as true–
false, yes–no, or correct–incorrect questions

A

Dichotomous test items

49
Q

Test items or questions with three or more
alternative responses, where only one is scored correct or scored as
being consistent with a targeted trait or other construct

A

Polytomous test items

50
Q

Provides a measure of the precision of an observed test score

A

Standard error of measurement

51
Q

Determine whether a
score is significantly different from a criterion

A

Confidence interval

52
Q

Statistical measure that can aid a test user in determining
how large a difference should be before it is considered statistically
significant.

A

Standard error of the
difference

53
Q

4 sources of error variance

A

Test construction
Test administration
Test scoring and interpretation
Other sources of error

54
Q

2 other sources of error variance

A

Sampling error - not an actual representative of a population of data
Methodological error - mistake or flaw in the design or execution of a research study