chapter 5 Flashcards

Question 1

Q

Is a synonym for dependability or consistency.
Refers to consistency in measurement.

Answer

A

Reliability

Question 2

Q

Is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance.

Answer

A

Reliability coefficient

Question 3

Q

That a score on an ability test is presumed
to reflect not only the testtaker’s true score on the ability being measured but also error.

Answer

A

Classical test theory

Question 4

Q

Variance from true differences.

Answer

A

True variance

Question 5

Q

A statistic useful in describing sources of test score variability.
This statistic is useful because it can be broken into components.
The standard deviation squared.

Question 6

Q

Refers to collectively, all of the factors associated
with the process of measuring some variable, other than the variable being measured.

Answer

A

Measurement error

Question 7

Q

Variance from irrelevant, random sources

Answer

A

Error variance

Question 8

Q

Is a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process.
- Sometimes referred to as “noise,” this source of error fluctuates from one testing situation to
another with no discernible pattern that would systematically raise or lower scores.

Answer

A

Random error

Question 9

Q

Refers to a source of error in measuring a
variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.

Answer

A

Systematic error

Question 10

Q

Sources of Error Variance:

Answer

A

Sources of error variance include test construction, administration, scoring, and/or
interpretation.

Question 11

Q

Terms refer to variation among items within a test as well as to variation among items between tests.
- Under test construction

Answer

A

Item sampling or content sampling

Question 12

Q

Sources of error variance that occur during test administration may influence the testtaker’s attention or motivation. The testtaker’s reactions to those influences are the source of one kind of error variance.
- Examples of untoward influences during
administration of a test include factors related to the: room temperature, level of lighting, and amount of ventilation and noise, for instance.

Answer

A

Test environment

Question 13

Q

Other potential sources of error variance during test administration are: pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medication can all be sources of error variance.

Answer

A

Test-taker variables

Question 14

Q

The examiner’s physical appearance and demeanor—even the presence or absence of an examiner—are some factors for consideration here.

Answer

A

Examiner-related variables

Question 15

Q

In many tests, the advent of computer scoring and a growing reliance on objective, computer-scorable items have virtually eliminated error variance caused by scorer differences.
However, not all tests can be scored from grids blackened by no. 2 pencils. Individually administered intelligence tests, some tests of personality, tests of creativity, various behavioral measures, essay tests, portfolio assessment, situational behavior tests, and countless other tools of assessment still require scoring by trained personnel.

Answer

A

Test scoring and interpretation

Question 16

Q

Surveys and polls are two tools of assessment commonly used by researchers who study public opinion.
Certain types of assessment situations lend themselves to particular varieties of systematic
and nonsystematic error.

Answer

A

Other sources of error

Question 17

Q

Reliability Estimates:

Answer

A

Test-Retest Reliability Estimates
Parallel-Forms and Alternate-Forms Reliability Estimates
Split-Half Reliability Estimates

Question 18

Q

Is an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test.
- Is appropriate when evaluating the reliability of a test that purports to measure something
that is relatively stable over time, such as a personality trait.

Answer

A

Test-retest reliability

Question 19

Q

When the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to as

Answer

A

Coefficient of stability

Question 20

Q

The degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability, which is often termed the

Answer

A

Coefficient of equivalence

Question 21

Q

of a test exist when, for each form of the test, the means and the variances of observed test scores are equal.

Answer

A

Parallel forms

Question 22

Q

Refers an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means and variances of observed test scores are equal.

Answer

A

Parallel forms reliability

Question 23

Q

Are simply different versions of a test that
have been constructed so as to be parallel.

Answer

A

Alternate forms

Question 24

Q

Refers an estimate of the extent to which these different forms of the same
test have been affected by item sampling error, or other error.

Answer

A

Alternate forms reliability

Question 25

Q

Deriving this type of estimate entails an evaluation of the internal consistency of the test items. Logically enough, it is referred to as an

Answer

A

Internal consistency estimate of reliability or as an estimate of inter-item consistency

Question 26

Q

There are different methods of obtaining internal consistency estimates of reliability. One such method is the

Answer

A

Split-half estimate

Question 27

Q

Is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.
- It is a useful measure of reliability when it is impractical or undesirable to assess reliability with two tests or to administer a test
twice (because of factors such as time or expense).

Answer

A

Split-half reliability

Question 28

Q

This method yields an estimate of split-half
reliability that is also referred to as

Answer

A

Odd-even reliability

Question 29

Q

Allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.

Answer

A

Spearman–Brown formula

Question 30

Q

Refers to the degree of correlation among all the
items on a scale.

Answer

A

Inter-item consistency

Question 31

Q

Is the degree to which a test measures a single factor. In other words; the extent to which items in a scale are unifactorial.
(derived from the Greek words homos, meaning “same,” and genos, meaning “kind”)

Answer

A

Homogeneity

Question 32

Q

Describes the degree to which a test
measures different factors.
Test is composed of items
that measure more than one trait.

Answer

A

Heterogeneity, heterogeneous

Question 33

Q

Dissatisfaction with existing split-half methods of estimating reliability compelled G. Frederic Kuder and M. W. Richardson (1937; Richardson & Kuder, 1939) to develop their own measures for estimating reliability.
- 20th formula developed in a series

Answer

A

Kuder–Richardson formula 20

Question 34

Q

A selected assortment of tests and assessment procedures—in the process of evaluation.
- typically composed of tests designed to measure different variables

Answer

A

Test battery

Question 35

Q

Developed by Cronbach (1951) and subsequently elaborated on by others (such as Kaiser & Michael, 1975; Novick & Lewis, 1967)
May be thought of as the mean of all possible split-half correlations, corrected by the Spearman–Brown formula.

Answer

A

Coefficient alpha

Question 36

Q

A relatively new measure for evaluating the internal consistency of a test
is measure that focuses on the degree of difference that exists between item scores; as a measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores

Answer

A

Average proportional distance (APD) method

Question 37

Q

Is the degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure.
often used when coding nonverbal behavior

Answer

A

Inter-scorer reliability

Question 38

Q

The simplest way of determining the degree of
consistency among scorers in the scoring of a test is to calculate a coefficient of correlation.

Answer

A

Coefficient of inter-scorer reliability

Question 39

Q

A source of error attributable to variations in the test-taker’s feelings, moods, or mental state over time.

Answer

A

Transient error

Question 40

Q

Recall that a test is said to be homogeneous
in items if it is functionally uniform throughout. Tests designed to measure one factor, such
as one ability or one trait, are expected to be homogeneous in items. For such tests, it is
reasonable to expect a high degree of internal consistency. By contrast, if the test is
heterogeneous in items, an estimate of internal consistency might be low relative to a more
appropriate estimate of test-retest reliability.

Answer

A

Homogeneity versus heterogeneity of test items

Question 41

Q

Is a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences.

Answer

A

Dynamic characteristics

Question 42

Q

In using and interpreting a coefficient of reliability, the issue variously referred to as restriction of range or restriction of variance (or, conversely, inflation of range or inflation of variance) is important. If the variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be lower. If the variance of either variable in a correlational analysis is inflated by the sampling procedure, then the resulting correlation coefficient tends to be higher.

Answer

A

Restriction or inflation of range

Question 43

Q

When a time limit is long enough to allow testtakers to attempt all items, and if some items are so difficult that no testtaker is able to obtain a perfect score, then the test is a

Answer

A

Power test

Question 44

Q

Generally contains items of uniform level of difficulty (typically uniformly low) so that, when given generous time limits, all testtakers should be able to complete all the test items correctly.

Answer

A

Speed test

Question 45

Q

Is designed to provide an indication of where a testtaker stands with respect to some variable or criterion, such as an educational or a vocational objective.
- tend to contain material that has been mastered in hierarchical fashion

Answer

A

Criterion-referenced tests

Question 46

Q

Referred to as the true score (or classical) model of measurement.

Answer

A

Classical test theory (CCT)

Question 47

Q

As a value that according to classical test theory
genuinely reflects an individual’s ability (or trait) level as measured by a particular test. Let’s
emphasize here that this value is indeed very test dependent.

Answer

A

True score

Question 48

Q

Seeks to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score.

Answer

A

Domain sampling theory

Question 49

Q

Is based on the idea that a person’s test scores vary from testing to testing because of
variables in the testing situation.

Answer

A

Generalizability theory

Question 50

Q

Cronbach encouraged test developers and researchers to describe the details of
the particular test situation or universe leading to a specific test score. This universe is
described in terms of its _____, which include things like the number of items in the test,
the amount of training the test scorers have had, and the purpose of the test administration.

Question 51

Q

According to generalizability theory, given the exact same conditions of all the facets in
the universe, the exact same test score should be obtained. This test score is the _____ ________.

Answer

A

Universe score

Question 52

Q

Examines how generalizable scores from a particular test are if the test is administered in different situations.

Answer

A

Generalizability study

Question 53

Q

The influence of particular facets on the test score is represented by

Answer

A

Coefficients of generalizability

Question 54

Q

Developers examine the usefulness of test
scores in helping the test user make decisions

Answer

A

Decision study

Question 55

Q

Another alternative to the true score model
A synonym for IRT in the academic literature is latent-trait theory

Answer

A

Item response theory (IRT)

Question 56

Q

In the context of IRT, it signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured.

Answer

A

Discrimination

Question 57

Q

Test items or questions that can be answered with only one of two alternative responses, such
as true–false, yes–no, or correct–incorrect questions

Answer

A

Dichotomous test items

Question 58

Q

Test items or questions with three or more alternative responses, where only one is scored correct or scored as being consistent with a targeted trait or other construct

Answer

A

Polytomous test items

Question 59

Q

Is a reference to an IRT model with very specific assumptions about the underlying distribution.

Answer

A

Rasch model

Question 60

Q

Provides a measure of the precision of an observed test score; provides an estimate of the
amount of error inherent in an observed score or measurement.
Is the tool used to estimate or infer the extent to which an observed score deviates from a true score.
the relationship between the SEM and the reliability of a test is inverse; the higher the reliability of a test (or individual subtest within a test), the lower the SEM.
denoted by the symbol σmeas, the standard error of measurement is an index of the extent to which one individual’s scores vary over tests presumed to be parallel. In accordance with the
true score model, an obtained test score represents one point in the theoretical distribution of scores the testtaker could have obtained.

Answer

A

Standard Error of Measurement

Question 61

Q

A range or band of test scores that is likely to contain the true score

Answer

A

Confidence interval

Question 62

Q

A statistical measure that can aid a test user in determining how large a difference should be
before it is considered statistically significant

Answer

A

Standard error of the difference