Test Realibility Flashcards

1
Q

is an index of reliability, a proportion that indicates the ratio between the
true score variance on a test and the total variance

A

Reliability coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q
  • a score on an ability test reflects not only the testtaker’s true score on the ability being
    measured but also error
A

Classical Test Theory (True Score Theory)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

3 Sources of Error Variance

A

Test Construction
Test Administration
Test Scoring and Interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

variance is attributed to item/content sampling

A

Test Construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

test environment, testtaker variables, examiner-related variables are factors that may
influence testtaker’s attention or motivation

A

Test Administration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

technical glitches, subjectivity of scorer, human error, etc

A

Test Scoring and Interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

§ obtained by correlating pairs of scores from the same people on two different administrations of the same test
§ appropriate when evaluating a test measuring a construct that is relatively stable over time (e.g. personality)
§ coefficient of stability
§ source of error variance: ti

A

Reliability Estimates (STABILITY)
TEST-RETEST RELIABILITY ESTIMATE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

§ two test administrations with the same group of test takers
§ coefficient of equivalence

A

Reliability Estimates (EQUIVALENCE)
PARALLEL-FORMS and ALTERNATE-FORMS RELIABILITY ESTIMATES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

a test exist when, for each versions of the test, the means and
variances of observed test scores are equal.

A

Parallel-forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

a test are typically designed to be equivalent/identical with
respect to variables such as content and level of difficulty

A

Alternate-forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

§obtained by correlating two pairs of scores obtained from
equivalent halves of a single test administered once

A

SPLIT-HALF RELIABILITY ESTIMATE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

◦ used to estimate internal consistency reliability from a correlation of two
halves of a test (either lengthened or shortened)

A

Spearman-Brown formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Full meaning of KR 20 & 21

A

KUDER-RICHARDSON FORMULA 20 & 21

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

used to determine the inter-item consistency of
dichotomous items - items that can be scored right or wrong (e.g.
Multiple-choice, Yes/No, True/False, Agree/Disagree)

A

KR-20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

items that can be scored right or wrong (e.g.
Multiple-choice, Yes/No, True/False, Agree/Disagree)

A

dichotomous items -

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

may be used if all the test items have approximately the
same degree of difficulty

A

KR-21

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

§most accepted and widely used reliability estimate
§Provides a measure of reliability from a single test administration
§developed by Lee Joseph Cronbach that’s why it is also called
Cronbach’s alpha
appropriate for use on tests containing nondichotomous items

A

COEFFICIENT ALPHA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Coefficient alpha developed by __________ that’s why it is also called _________

A

Lee Joseph Cronbach
Cronbach’s alpha

19
Q

appropriate for use on tests containing ________ items
(Strongly Disagree - Strongly Agree

A

nondichotomous

20
Q

§degree of agreement or consistency between two or more scorers
with regard to a particular measure
§scorers must have sufficient training in standardized scoring
§source of error: scoring criteria
§coefficient of inter-scorer reliability

A

INTER-SCORER RELIABILITY ESTIMATE

21
Q

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üNever buy any form of assessment/measurement where there is

A

no reliability
coefficient or where it is below 0.7

22
Q

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üPersonality and similar measures: ___________ is often
recommended as minimum

A

0.6 to 0.8 although above 0.7

23
Q

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üAbility, aptitude, IQ and other forms of reasoning tests should have coefficients
___________has been recommended as an excellent value. Where the
intention is to compare people’s scores, such as when selecting people for a job,
values ______ should be the aim.

A

above 0.8. Above 0.85
& above 0.85

24
Q

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üThe sample size used for calculation of reliability should never be _____

25
Q

5 Reliability and Nature of the Test

A

Homogeneity vs. Heterogeneity of test items
Dynamic vs. Static characteristics
Restriction or Inflation of range
Speed tests vs Power tests
Criterion-referenced tests

26
Q
  • uniformity of test items
A

Homogenous items

27
Q
  • various items measuring multiple
    constructs
A

Heterogenous items

28
Q
  • changing trait, state, or ability (e.g. anxiety)
29
Q
  • stable/enduring trait, state, or ability
30
Q

Variability of test scores is directly related to correlation coefficient

A

Restriction or Inflation of range

31
Q
  • reliability estimate of speed tests should be based on performance
    from two independent testing periods
A

Speed tests vs Power tests

32
Q
  • traditional procedures of estimating reliability are usually not
    appropriate for use with__________ though there may be instances in which traditional estimates can be adopted
A

Criterion-referenced tests

33
Q

3 Alternatives to the True Score Theory or Classical Test Theory

A

DOMAIN SAMPLING THEORY
GENERALIZABILITY THEORY
ITEM RESPONSE THEORY

34
Q

§ seek to estimate the extent to which specific sources of variation under
defined conditions are contributing to the test score
§ posits that a test score is a sample from a larger, theoretical “domain” of possible items, and the reliability of a test increases with the number of items sampled from that domain

A

DOMAIN SAMPLING THEORY

35
Q

GENERALIZABILITY THEORY
originally referred to as the ________ is a modified form of DST
§ developed by _______

A

Domain Sampling Theory; GT
Cronbach and colleagues

36
Q

§ a person’s test scores vary from testing to testing because of variables in
the testing situation
§ given the exact same conditions of all the facets in the universe, the exact
same test score should be obtained
§ test reliability does not reside within the test itself, rather, it is a function of
the circumstances under which the test is developed, administered, and
interpreted

A

GENERALIZABILITY THEORY

37
Q

§ a theory of testing based on the relationship between an individual’s performance on a test
item and the test taker’s level of performance on an overall measure of the ability the item
was designed to measure.
§ Persons with lower ability have less of a chance, while persons with high ability are very likely
to answer correctly; for example, students with higher math ability are more likely to get a
math item correct.

A

ITEM RESPONSE THEORY

38
Q

ITEM RESPONSE THEORY
IRT models are often referred to as ____________. The term latent is used to emphasize
that discrete item responses are taken to be observable manifestations of hypothesized traits,
constructs, or attributes, not directly observed, but which must be inferred from the manifest
responses

A

latent trait models

39
Q

2 Reliability and Individual Scores

A

STANDARD ERROR OF MEASUREMENT
STANDARD ERROR OF THE DIFFERENCE

40
Q
  • a range or band of test scores that is likely to contain
    the true score
A

confidence interval

41
Q

often abbreviated as SEM or SEM, provides a measure of precision of an
observed score; an estimate of an amount of error inherent in an observed
§ SEM and reliability of a test has an inverse relationship, that is, the higher the
reliability of a test (or individual subtest within a test), the lower the SEM
§ it can be used to set the confidence interval for a particular score or to
determine whether a score is significantly different from a criterion.

A

STANDARD ERROR OF MEASUREMENT

42
Q

SEM

A

STANDARD ERROR OF MEASUREMENT

43
Q

used to determine how large a difference should be before it is considered
statistically significant
§ in cases such as recruitment and selection, ________
can be used to compare the test scores of applicants which can help
personnel officers in making hiring decisions

A

STANDARD ERROR OF THE DIFFERENCE