Test Realibility Flashcards

Question 1

Q

is an index of reliability, a proportion that indicates the ratio between the
true score variance on a test and the total variance

Answer

A

Reliability coefficient

Question 2

Q

a score on an ability test reflects not only the testtaker’s true score on the ability being
measured but also error

Answer

A

Classical Test Theory (True Score Theory)

Question 3

Q

3 Sources of Error Variance

Answer

A

Test Construction
Test Administration
Test Scoring and Interpretation

Question 4

Q

variance is attributed to item/content sampling

Answer

A

Test Construction

Question 5

Q

test environment, testtaker variables, examiner-related variables are factors that may
influence testtaker’s attention or motivation

Answer

A

Test Administration

Question 6

Q

technical glitches, subjectivity of scorer, human error, etc

Answer

A

Test Scoring and Interpretation

Question 7

Q

§ obtained by correlating pairs of scores from the same people on two different administrations of the same test
§ appropriate when evaluating a test measuring a construct that is relatively stable over time (e.g. personality)
§ coefficient of stability
§ source of error variance: ti

Answer

A

Reliability Estimates (STABILITY)
TEST-RETEST RELIABILITY ESTIMATE

Question 8

Q

§ two test administrations with the same group of test takers
§ coefficient of equivalence

Answer

A

Reliability Estimates (EQUIVALENCE)
PARALLEL-FORMS and ALTERNATE-FORMS RELIABILITY ESTIMATES

Question 9

Q

a test exist when, for each versions of the test, the means and
variances of observed test scores are equal.

Answer

A

Parallel-forms

Question 10

Q

a test are typically designed to be equivalent/identical with
respect to variables such as content and level of difficulty

Answer

A

Alternate-forms

Question 11

Q

§obtained by correlating two pairs of scores obtained from
equivalent halves of a single test administered once

Answer

A

SPLIT-HALF RELIABILITY ESTIMATE

Question 12

Q

◦ used to estimate internal consistency reliability from a correlation of two
halves of a test (either lengthened or shortened)

Answer

A

Spearman-Brown formula

Question 13

Q

Full meaning of KR 20 & 21

Answer

A

KUDER-RICHARDSON FORMULA 20 & 21

Question 14

Q

used to determine the inter-item consistency of
dichotomous items - items that can be scored right or wrong (e.g.
Multiple-choice, Yes/No, True/False, Agree/Disagree)

Question 15

Q

items that can be scored right or wrong (e.g.
Multiple-choice, Yes/No, True/False, Agree/Disagree)

Answer

A

dichotomous items -

Question 16

Q

may be used if all the test items have approximately the
same degree of difficulty

Question 17

Q

§most accepted and widely used reliability estimate
§Provides a measure of reliability from a single test administration
§developed by Lee Joseph Cronbach that’s why it is also called
Cronbach’s alpha
appropriate for use on tests containing nondichotomous items

Answer

A

COEFFICIENT ALPHA

Question 18

Q

Coefficient alpha developed by __________ that’s why it is also called _________

Answer

A

Lee Joseph Cronbach
Cronbach’s alpha

Question 19

Q

appropriate for use on tests containing ________ items
(Strongly Disagree - Strongly Agree

Answer

A

nondichotomous

Question 20

Q

§degree of agreement or consistency between two or more scorers
with regard to a particular measure
§scorers must have sufficient training in standardized scoring
§source of error: scoring criteria
§coefficient of inter-scorer reliability

Answer

A

INTER-SCORER RELIABILITY ESTIMATE

Question 21

Q

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üNever buy any form of assessment/measurement where there is

Answer

A

no reliability
coefficient or where it is below 0.7

Question 22

Q

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üPersonality and similar measures: ___________ is often
recommended as minimum

Answer

A

0.6 to 0.8 although above 0.7

Question 23

Q

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üAbility, aptitude, IQ and other forms of reasoning tests should have coefficients
___________has been recommended as an excellent value. Where the
intention is to compare people’s scores, such as when selecting people for a job,
values ______ should be the aim.

Answer

A

above 0.8. Above 0.85
& above 0.85

Question 24

Q

Using and Interpreting a Reliability Coefficient
When purchasing tests:

üThe sample size used for calculation of reliability should never be _____

Answer

A

below 100

Question 25

Q

5 Reliability and Nature of the Test

Answer

A

Homogeneity vs. Heterogeneity of test items
Dynamic vs. Static characteristics
Restriction or Inflation of range
Speed tests vs Power tests
Criterion-referenced tests

Question 26

Q

uniformity of test items

Answer

A

Homogenous items

Question 27

Q

various items measuring multiple
constructs

Answer

A

Heterogenous items

Question 28

Q

changing trait, state, or ability (e.g. anxiety)

Question 29

Q

stable/enduring trait, state, or ability

Question 30

Q

Variability of test scores is directly related to correlation coefficient

Answer

A

Restriction or Inflation of range

Question 31

Q

reliability estimate of speed tests should be based on performance
from two independent testing periods

Answer

A

Speed tests vs Power tests

Question 32

Q

traditional procedures of estimating reliability are usually not
appropriate for use with__________ though there may be instances in which traditional estimates can be adopted

Answer

A

Criterion-referenced tests

Question 33

Q

3 Alternatives to the True Score Theory or Classical Test Theory

Answer

A

DOMAIN SAMPLING THEORY
GENERALIZABILITY THEORY
ITEM RESPONSE THEORY

Question 34

Q

§ seek to estimate the extent to which specific sources of variation under
defined conditions are contributing to the test score
§ posits that a test score is a sample from a larger, theoretical “domain” of possible items, and the reliability of a test increases with the number of items sampled from that domain

Answer

A

DOMAIN SAMPLING THEORY

Question 35

Q

GENERALIZABILITY THEORY
originally referred to as the ________ is a modified form of DST
§ developed by _______

Answer

A

Domain Sampling Theory; GT
Cronbach and colleagues

Question 36

Q

§ a person’s test scores vary from testing to testing because of variables in
the testing situation
§ given the exact same conditions of all the facets in the universe, the exact
same test score should be obtained
§ test reliability does not reside within the test itself, rather, it is a function of
the circumstances under which the test is developed, administered, and
interpreted

Answer

A

GENERALIZABILITY THEORY

Question 37

Q

§ a theory of testing based on the relationship between an individual’s performance on a test
item and the test taker’s level of performance on an overall measure of the ability the item
was designed to measure.
§ Persons with lower ability have less of a chance, while persons with high ability are very likely
to answer correctly; for example, students with higher math ability are more likely to get a
math item correct.

Answer

A

ITEM RESPONSE THEORY

Question 38

Q

ITEM RESPONSE THEORY
IRT models are often referred to as ____________. The term latent is used to emphasize
that discrete item responses are taken to be observable manifestations of hypothesized traits,
constructs, or attributes, not directly observed, but which must be inferred from the manifest
responses

Answer

A

latent trait models

Question 39

Q

2 Reliability and Individual Scores

Answer

A

STANDARD ERROR OF MEASUREMENT
STANDARD ERROR OF THE DIFFERENCE

Question 40

Q

a range or band of test scores that is likely to contain
the true score

Answer

A

confidence interval

Question 41

Q

often abbreviated as SEM or SEM, provides a measure of precision of an
observed score; an estimate of an amount of error inherent in an observed
§ SEM and reliability of a test has an inverse relationship, that is, the higher the
reliability of a test (or individual subtest within a test), the lower the SEM
§ it can be used to set the confidence interval for a particular score or to
determine whether a score is significantly different from a criterion.

Answer

A

STANDARD ERROR OF MEASUREMENT

Question 42

Q

SEM

Answer

A

STANDARD ERROR OF MEASUREMENT

Question 43

Q

used to determine how large a difference should be before it is considered
statistically significant
§ in cases such as recruitment and selection, ________
can be used to compare the test scores of applicants which can help
personnel officers in making hiring decisions

Answer

A

STANDARD ERROR OF THE DIFFERENCE