Chapter 5 Flashcards by Fritzy Gletch Berdin

Refers to consistency in measurement.

Not an all-or-none matter.

A test may be reliable
in one context and unreliable in another.

RELIABILITY

How well did you know this?

Not at all

Perfectly

Index of reliability, a proportion
that indicates the ratio between the true score variance on a
test and the total variance.

Reliability coefficient

How well did you know this?

Not at all

Perfectly

Score on an ability test is presumed to reflect
not only the testtaker’s true score on the ability being measured but also error.

X= T + E

Classical test theory

How well did you know this?

Not at all

Perfectly

A statistic useful in describing sources of test score variability

Variance

How well did you know this?

Not at all

Perfectly

2 types of measurement error

Systematic
Random

How well did you know this?

Not at all

Perfectly

Source of error in measuring a targeted variable
caused by unpredictable fluctuations and inconsistencies of other
variables in the measurement process.

Random error

How well did you know this?

Not at all

Perfectly

Source of error in measuring a variable that
is typically constant or proportionate to what is presumed to be the true
value of the variable being measured.

Doesn’t affect score consistency

Systematic error

How well did you know this?

Not at all

Perfectly

Source of variance which refers to variation among items
within a test as well as to variation among items between tests.

Differences are sure to be found in the way the items are worded
and in the exact content sampled.

Item sampling
or Content sampling

How well did you know this?

Not at all

Perfectly

The testtaker’s reactions to those influences are the source of
one kind of error variance:

Sources of error in test administration

Test environment - physical setting where a test is administered

Testtaker variables - factors related to the individual taking the test

Examiner-related variables - factors related to the person administering the test

How well did you know this?

Not at all

Perfectly

2 possible sources of error variance under test scoring and interpretation

Scorers - subjectivity involved
Scoring systems - glitch if computer scoring

How well did you know this?

Not at all

Perfectly

3 potential sources of nonsystematic error in assessment
situation

Forgetting
Failing to notice abusive
behavior
Misunderstanding instructions regarding
reporting

How well did you know this?

Not at all

Perfectly

Estimate of reliability obtained by
correlating pairs of scores from the same people on two different
administrations of the same test.

Test-retest reliability

How well did you know this?

Not at all

Perfectly

The
estimate of test-retest reliability when the interval between testing is greater than six months

Coefficient
of stability

How well did you know this?

Not at all

Perfectly

The degree of the relationship between various forms of a test can be
evaluated by means of _______________; often referred to as coefficient of equivalence.

Alternate-forms or parallel-forms
coefficient of reliability

How well did you know this?

Not at all

Perfectly

Exist when, for each form of the test, the
means and the variances of observed test scores are equal.

Parallel forms

How well did you know this?

Not at all

Perfectly

Simply different versions of a test that have
been constructed so as to be parallel.

Alternate forms

How well did you know this?

Not at all

Perfectly

Obtained by correlating two pairs of scores obtained from
equivalent halves of a single test administered once.

Internal consistency estimate of reliability

Estimate of interitem consistency

How well did you know this?

Not at all

Perfectly

Step 1. Divide the test into equivalent halves.

Step 2. Calculate a Pearson r between scores on the two halves
of the test.

Step 3. Adjust the half-test reliability using the Spearman-Brown formula

3 steps in the computation of coefficient of split-half reliability

How well did you know this?

Not at all

Perfectly

3 acceptable ways to split a test

Randomly assign
Odd-even reliability
By content

How well did you know this?

Not at all

Perfectly

Assign odd-numbered
items to one half of the test and even-numbered items to the
other half

Odd-even reliability

How well did you know this?

Not at all

Perfectly

Each half contains items equivalent with respect to content and
difficulty

By content

How well did you know this?

Not at all

Perfectly

Randomly assign items
to one or the other half of the test.

Study These Flashcards

Randomly assign

Formula that allows a test developer or user to
estimate internal consistency reliability from a correlation of two halves of
a test.

Study These Flashcards

Spearman-Brown

Degree of correlation among all
the items on a scale.

Study These Flashcards

Inter-item consistency

Degree to which a test measures a single factor.

Homogeneity

Degree to which a test measures different factors.

Heterogeneity

Statistic of choice for determining the inter-item consistency of dichotomous items, primarily those items that can be scored right or wrong

KuderRichardson formula 20 or KR-20

May be thought of as the mean of all possible split-half correlations, corrected by the Spearman-Brown formula. Typically ranges in value from 0 to 1.

COEFFICIENT ALPHA

A measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores.

AVERAGE PROPORTIONAL DISTANCE (APD)

Step 1: Calculate the absolute difference between scores for all of the items. Step 2: Average the difference between scores. Step 3: Obtain the APD by dividing the average difference between scores by the number of response options on the test, minus one

Steps in getting AVERAGE PROPORTIONAL DISTANCE (APD)

Degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure.

Inter-scorer reliability

1. the test items are homogeneous or heterogeneous in nature 2. the characteristic, ability, or trait being measured is presumed to be dynamic or static 3. the range of test scores is or is not restricted 4. the test is a speed or a power test 5. the test is or is not criterion-referenced

5 natures of test

Homogeneity vs heterogeneity of a test

Homogeneity of test items - all the items on a test are very similar in content, difficulty, and format, essentially measuring the same construct Heterogeneity of test items - items vary significantly in content, difficulty, and format, potentially assessing multiple related but distinct aspects of a concept

Trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences.

Dynamic characteristic

Trait, state, or ability presumed to be relatively unchanging such as intelligence.

Static characteristic

When a time limit is long enough to allow testtakers to attempt all items, and if some items are so difficult that no testtaker is able to obtain a perfect score

Power test

Generally contains items of uniform level of difficulty (typically uniformly low) so that, when given generous time limits, all testtakers should be able to complete all the test items correctly

Speed test

A reliability estimate of a speed test should be based on performance from two independent testing periods using one of the following: ___________ from two separately timed half tests

Test-retest reliability Alternate-forms reliability Split-half reliability

Designed to provide an indication of where a testtaker stands with respect to some variable or criterion, such as an educational or a vocational objective.

Criterion-referenced test

Any theoretical and statistical framework describing how respondents generate their answers to items on a scale or instrument and explaining associated sources of error

PSYCHOMETRIC MODELS

3 major approaches of psychometric models:

Classical test theory Generalizability theory Item response theory

A framework of principles and assumptions about how to determine the reliability of a set of data.

GENERALIZABILITY THEORY

Include things like the number of items in the test, the amount of training the test scorers have had, and the purpose of the test administration.

Facets

Examines how generalizable scores from a particular test are if the test is administered in different situations

Generalizability study

Represents the influence of particular facets on the test score, which is similar to reliability coefficient in the true score model.

Coefficients of generalizability

Involves the application of information from the generalizability study Developers examine the usefulness of test scores in helping the test user make decisions.

Decision study

States that the probability that an item will be answered correctly is a function of an underlying trait or ability that is not directly observable; that is, a latent trait; also referred to as latent-trait theory

ITEM RESPONSE THEORY

Test items or questions that can be answered with only one of two alternative responses, such as true– false, yes–no, or correct–incorrect questions

Dichotomous test items

Test items or questions with three or more alternative responses, where only one is scored correct or scored as being consistent with a targeted trait or other construct

Polytomous test items

Provides a measure of the precision of an observed test score

Standard error of measurement

Determine whether a score is significantly different from a criterion

Confidence interval

Statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant.

Standard error of the difference

4 sources of error variance

Test construction Test administration Test scoring and interpretation Other sources of error

2 other sources of error variance

Sampling error - not an actual representative of a population of data Methodological error - mistake or flaw in the design or execution of a research study

Chapter 5 Flashcards

(54 cards)