chapter 5 Flashcards

1
Q
  • Is a synonym for dependability or consistency.
  • Refers to consistency in measurement.
A

Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance.

A

Reliability coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

That a score on an ability test is presumed
to reflect not only the testtaker’s true score on the ability being measured but also error.

A

Classical test theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
  • Variance from true differences.
A

True variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
  • A statistic useful in describing sources of test score variability.
  • This statistic is useful because it can be broken into components.
  • The standard deviation squared.
A

Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Refers to collectively, all of the factors associated
with the process of measuring some variable, other than the variable being measured.

A

Measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Variance from irrelevant, random sources

A

Error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Is a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process.
- Sometimes referred to as “noise,” this source of error fluctuates from one testing situation to
another with no discernible pattern that would systematically raise or lower scores.

A

Random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Refers to a source of error in measuring a
variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.

A

Systematic error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sources of Error Variance:

A

Sources of error variance include test construction, administration, scoring, and/or
interpretation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Terms refer to variation among items within a test as well as to variation among items between tests.
- Under test construction

A

Item sampling or content sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sources of error variance that occur during test administration may influence the testtaker’s attention or motivation. The testtaker’s reactions to those influences are the source of one kind of error variance.
- Examples of untoward influences during
administration of a test include factors related to the: room temperature, level of lighting, and amount of ventilation and noise, for instance.

A

Test environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  • Other potential sources of error variance during test administration are: pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medication can all be sources of error variance.
A

Test-taker variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The examiner’s physical appearance and demeanor—even the presence or absence of an examiner—are some factors for consideration here.

A

Examiner-related variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
  • In many tests, the advent of computer scoring and a growing reliance on objective, computer-scorable items have virtually eliminated error variance caused by scorer differences.
  • However, not all tests can be scored from grids blackened by no. 2 pencils. Individually administered intelligence tests, some tests of personality, tests of creativity, various behavioral measures, essay tests, portfolio assessment, situational behavior tests, and countless other tools of assessment still require scoring by trained personnel.
A

Test scoring and interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q
  • Surveys and polls are two tools of assessment commonly used by researchers who study public opinion.
  • Certain types of assessment situations lend themselves to particular varieties of systematic
    and nonsystematic error.
A

Other sources of error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Reliability Estimates:

A
  • Test-Retest Reliability Estimates
  • Parallel-Forms and Alternate-Forms Reliability Estimates
  • Split-Half Reliability Estimates
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Is an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test.
- Is appropriate when evaluating the reliability of a test that purports to measure something
that is relatively stable over time, such as a personality trait.

A

Test-retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

When the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to as

A

Coefficient of stability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel-forms coefficient of reliability, which is often termed the

A

Coefficient of equivalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q
  • of a test exist when, for each form of the test, the means and the variances of observed test scores are equal.
A

Parallel forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Refers an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means and variances of observed test scores are equal.

A

Parallel forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Are simply different versions of a test that
have been constructed so as to be parallel.

A

Alternate forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Refers an estimate of the extent to which these different forms of the same
test have been affected by item sampling error, or other error.

A

Alternate forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Deriving this type of estimate entails an evaluation of the internal consistency of the test items. Logically enough, it is referred to as an

A

Internal consistency estimate of reliability or as an estimate of inter-item consistency

26
Q

There are different methods of obtaining internal consistency estimates of reliability. One such method is the

A

Split-half estimate

27
Q

Is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.
- It is a useful measure of reliability when it is impractical or undesirable to assess reliability with two tests or to administer a test
twice (because of factors such as time or expense).

A

Split-half reliability

28
Q

This method yields an estimate of split-half
reliability that is also referred to as

A

Odd-even reliability

29
Q

Allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.

A

Spearman–Brown formula

30
Q

Refers to the degree of correlation among all the
items on a scale.

A

Inter-item consistency

31
Q
  • Is the degree to which a test measures a single factor. In other words; the extent to which items in a scale are unifactorial.
  • (derived from the Greek words homos, meaning “same,” and genos, meaning “kind”)
A

Homogeneity

32
Q
  • Describes the degree to which a test
    measures different factors.
  • Test is composed of items
    that measure more than one trait.
A

Heterogeneity, heterogeneous

33
Q

Dissatisfaction with existing split-half methods of estimating reliability compelled G. Frederic Kuder and M. W. Richardson (1937; Richardson & Kuder, 1939) to develop their own measures for estimating reliability.
- 20th formula developed in a series

A

Kuder–Richardson formula 20

34
Q

A selected assortment of tests and assessment procedures—in the process of evaluation.
- typically composed of tests designed to measure different variables

A

Test battery

35
Q
  • Developed by Cronbach (1951) and subsequently elaborated on by others (such as Kaiser & Michael, 1975; Novick & Lewis, 1967)
  • May be thought of as the mean of all possible split-half correlations, corrected by the Spearman–Brown formula.
A

Coefficient alpha

36
Q
  • A relatively new measure for evaluating the internal consistency of a test
  • is measure that focuses on the degree of difference that exists between item scores; as a measure used to evaluate the internal consistency of a test that focuses on the degree of difference that exists between item scores
A

Average proportional distance (APD) method

37
Q
  • Is the degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure.
  • often used when coding nonverbal behavior
A

Inter-scorer reliability

38
Q

The simplest way of determining the degree of
consistency among scorers in the scoring of a test is to calculate a coefficient of correlation.

A

Coefficient of inter-scorer reliability

39
Q

A source of error attributable to variations in the test-taker’s feelings, moods, or mental state over time.

A

Transient error

40
Q

Recall that a test is said to be homogeneous
in items if it is functionally uniform throughout. Tests designed to measure one factor, such
as one ability or one trait, are expected to be homogeneous in items. For such tests, it is
reasonable to expect a high degree of internal consistency. By contrast, if the test is
heterogeneous in items, an estimate of internal consistency might be low relative to a more
appropriate estimate of test-retest reliability.

A

Homogeneity versus heterogeneity of test items

41
Q

Is a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences.

A

Dynamic characteristics

42
Q

In using and interpreting a coefficient of reliability, the issue variously referred to as restriction of range or restriction of variance (or, conversely, inflation of range or inflation of variance) is important. If the variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be lower. If the variance of either variable in a correlational analysis is inflated by the sampling procedure, then the resulting correlation coefficient tends to be higher.

A

Restriction or inflation of range

43
Q

When a time limit is long enough to allow testtakers to attempt all items, and if some items are so difficult that no testtaker is able to obtain a perfect score, then the test is a

A

Power test

44
Q

Generally contains items of uniform level of difficulty (typically uniformly low) so that, when given generous time limits, all testtakers should be able to complete all the test items correctly.

A

Speed test

45
Q

Is designed to provide an indication of where a testtaker stands with respect to some variable or criterion, such as an educational or a vocational objective.
- tend to contain material that has been mastered in hierarchical fashion

A

Criterion-referenced tests

46
Q

Referred to as the true score (or classical) model of measurement.

A

Classical test theory (CCT)

47
Q

As a value that according to classical test theory
genuinely reflects an individual’s ability (or trait) level as measured by a particular test. Let’s
emphasize here that this value is indeed very test dependent.

A

True score

48
Q

Seeks to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score.

A

Domain sampling theory

49
Q

Is based on the idea that a person’s test scores vary from testing to testing because of
variables in the testing situation.

A

Generalizability theory

50
Q

Cronbach encouraged test developers and researchers to describe the details of
the particular test situation or universe leading to a specific test score. This universe is
described in terms of its _____, which include things like the number of items in the test,
the amount of training the test scorers have had, and the purpose of the test administration.

51
Q

According to generalizability theory, given the exact same conditions of all the facets in
the universe, the exact same test score should be obtained. This test score is the _____ ________.

A

Universe score

52
Q

Examines how generalizable scores from a particular test are if the test is administered in different situations.

A

Generalizability study

53
Q

The influence of particular facets on the test score is represented by

A

Coefficients of generalizability

54
Q

Developers examine the usefulness of test
scores in helping the test user make decisions

A

Decision study

55
Q
  • Another alternative to the true score model
  • A synonym for IRT in the academic literature is latent-trait theory
A

Item response theory (IRT)

56
Q

In the context of IRT, it signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured.

A

Discrimination

57
Q

Test items or questions that can be answered with only one of two alternative responses, such
as true–false, yes–no, or correct–incorrect questions

A

Dichotomous test items

58
Q

Test items or questions with three or more alternative responses, where only one is scored correct or scored as being consistent with a targeted trait or other construct

A

Polytomous test items

59
Q

Is a reference to an IRT model with very specific assumptions about the underlying distribution.

A

Rasch model

60
Q
  • Provides a measure of the precision of an observed test score; provides an estimate of the
    amount of error inherent in an observed score or measurement.
  • Is the tool used to estimate or infer the extent to which an observed score deviates from a true score.
  • the relationship between the SEM and the reliability of a test is inverse; the higher the reliability of a test (or individual subtest within a test), the lower the SEM.
  • denoted by the symbol σmeas, the standard error of measurement is an index of the extent to which one individual’s scores vary over tests presumed to be parallel. In accordance with the
    true score model, an obtained test score represents one point in the theoretical distribution of scores the testtaker could have obtained.
A

Standard Error of Measurement

61
Q

A range or band of test scores that is likely to contain the true score

A

Confidence interval

62
Q

A statistical measure that can aid a test user in determining how large a difference should be
before it is considered statistically significant

A

Standard error of the difference