Chapter 5: Reliability Flashcards

Question

Internal Consistency Estimate of Reliability/Estimate of Inter-Item Consistency

Answer 1

Obtaining an estimate of the reliability of a test without developing an alternate form of the test and without having to administer the test twice to the same people

Answer 2

Obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once; useful measure of reliability when it is impractical or undersirable to assess reliability with two tests or to administer a test twice

Answer 3

Divide the test into equivalent halves. Calculate a Pearson r between scores on the two halves of the test Adjust the half-test reliability using the Spearman-Brown formula

Answer 4

Randomly assign items to one or the other half of the test; assign odd-numbered items to one half of the test and even-numbered items to the other half

Answer 5

assign odd-numbered items to one half of the test and even-numbered items to the other half

Answer 6

Each half equal to the other in format, stylistic, statistical, and related aspect

Answer 7

Allows a test developer or user to estimate internal consistency and reliability from a correlation of two halves of a test; Specific application to estimate the reliability of a test that is legnthened or shhortened by any number of items; used to determine the number of items needed to attain a desired level of reliability

Answer 8

The rule is that new items must be equivalent in content and difficulty so that the longer test still measures what the original test measured

Answer 9

When measuring the reliability of a heterogeneous test and speed test

Answer 10

Refers to the degree of correlation among all the items on a scale; calculated from a single administration of a single form on a test; useful in assessing the homogeneity of a test

Answer 11

Degree to which a test measures a single factor; extent to which items in a scale are unifactorial

Answer 12

Degree to which a test measures different factors; composed of items that measure more than one trait

Answer 13

The more homogeneous a test is, the more inter-item consistency it can be expected to have; Desirable because it allows relatively straighforward test-score interpretation

Answer 14

Have similar abilities in the area tested

Answer 15

May have different abilities

Answer 16

Insufficient tool for measuring multifaceted psychological variables such as intelligence or personality

Answer 17

Developed their own measures for estimating reliability; Kuder-Richardson Formula 20 (KR-20)

Answer 18

Most popular formula

Answer 19

KR-20 and split-half reliability estimates will be similar

Answer 20

KR-20 will yield lower reliability estimates than the split-half method

Answer 21

Items that can be scored right or wrong, such as multiple choice items

Answer 22

A selected assortment of tests and assessment procedures in the process of evaluation; typically composed of tests designed to measure different variables

Answer 23

The Kuder-Richardson Formula 20 Reliability Coefficient

Answer 24

Used if there is reason to assume that all the test items have approximately the same degree of difficulty; Outdated in an era of calculators and computers

Answer 25

Variant of the KR-20 that has received the most acceptance and is in widest used today; mean of all possible split-half correlations, corrected by the Spearman-Brown formula; approriate for use on tests containing nondichotomous items; preferred statistic for obtaining an estimate of internal consistency reliability; formula yields an estimate of the mean of all possible test-retest, split-half coefficients; widely used as a measure of reliability, in part because it requires only one administration of the test; gives information about the test scores and not the test itself

Answer 26

Ranges in value from 0 to 1; impossible to yield a negative value of alpha, if negative, report as zero

Answer 27

0 Absolutely no similarity 1 Perfectly identical - Alpha is usually reported as Zero

Answer 28

Degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure

Answer 29

A way to determine the degree of consistency among scorers

Answer 30

Test-Retest Alternate or Parallel Forms Internal or Inter-Item Consistency

Answer 31

On a continuum relative to the purpose and importance of the decisions to be made on the basis of scores on the test

Answer 32

Test items are homogeneous or heterogeneous in nature The characteristic, ability, or trait being measured is presumed to be dynamic or static The range of test scores is or is not restricted Test is a speed or a power test Test is or is not criterion-referenced

Answer 33

``` True Variance 67% Error due to Test Construction 18% Administration Error 5% Unidentified Error 5% Scorer Error 5% ```

Answer 34

HOmogeneous in items if it is functionally uniform throughout

Answer 35

An estimate of internal consistency might be low relative to a more appropriate estimate of test-retest reliability

Answer 36

A trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences; Obtained measurement would not be expected to vary significantly as a function of time, and either the test-retest or the alternate forms method would be appropriate;

Answer 37

Trait, state, or ability resumed to be relatively unchanging

Answer 38

If Variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be lower; if the variance of either variable in a correlational analysis is inflated by the sampling procedure, then the resulting correlation coefficient tends to be higher

Answer 39

when a time limit is long enough to allow testtakers to attempt all items and if some items are so difficult that no testtaker is able to obtain a perfect score

Answer 40

Generally contains items of uniform level of difficulty so that when gien generous time limits, all testtakers should be able to complete all test items correctly; based on performance speed; time limit is established so that few, if any, of the testtakers will be able to complete the entire test

Answer 41

Based on performance from two independent testng periods using one of the following: Test-Retest Reliability Alternate-Forms Reliability Split-Half Reliability from two separately timed half tests

Answer 42

The obtained reliability coeffiient is for a half test and should be adjusted using the Spearman-Brown formula

Answer 43

Result will be a spuriously high reliability coefficient; two people, one who completes 82 items of a speed test and another who completes 61 items of the same speed test; correlation of the two will be close to 1 but will not say anything about response consistency

Answer 44

Designed to provide an indication of whether a testtaker stands with respect to some variable or cirterion, such as an educational or a vocational objective; tend to contain material that has been mastered in heirarchical fashion; tend to be interpreted in pass-fail terms, and any scrutiny of performance on individual items tends to be for diagnostic and remedial purpose

Answer 45

Based on the correlation between the total scores on two admnistrations of the same test

Answer 46

A reliability estimate is based on the correlation between scores on two halves of the test and is then adjusted using the Spearman-Brown formula to obtain a reliability estimate of the whole test

Answer 47

Seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score; A test's reliability is conceived of as an objective measure ofhow precisely the test score assesses the domain from which the test draws a sample

Answer 48

Universe of items that could conceivably measure that behavior; hypothetical construct: one that shares certain characteristics with (and is measured by) the sample of items that make up the test

Answer 49

May be viewed as an extension of true score theory wherein the concept of a universe score replaces that of a true score; developed by Lee J. Cronbach; Given the same conditions of all the facets in the universe, the exact same test score should be obtained

Answer 50

Encouraged test deelopers and researchers to describe the details of the particular test situation (universe) leading to a speciic test score

Answer 51

Described in terms of its facets

Answer 52

Include things like the number of items in the test, the amount of training the test scorers have had, and the purose of the test administration

Answer 53

The test score; analogous to a true score in the true score model

Answer 54

Examines how generalizable scores from a particular test are if the test is administered in different situations; examines how much of an impact different facets of the universe have on the test score

Answer 55

Influence of particular facets on the test score; similar to reliability coefficients in the true score model

Answer 56

Developers examine the usefulness of test scores in helping the test user make decisions; designed to tell the test user how test scores should be used and how dependable those scores are as a basis for decisions, depending on the context of their use

Answer 57

Provide a way to model the probability that a person with X ability will be able to perform at a level of Y; Stated in terms of personality assessment, it models the probability that a person with X amount of a particular personality trait will exhibit Y amount of that trait on a personality test designed to measure it; not a term used to reer to a single theory or method

Answer 58

Physically unobservable

Answer 59

Synonym for IRT; Propose models that describe how the latent trait influences performance on each test item; theoretically can take on values from -infinity to +infinity;

Answer 60

Difficulty Leel of an Item | Item's Level of Discrimination

Answer 61

Refers to the attribute of not being easily accomplished, solved, or comprehended; May also refer to physical difficulty

Answer 62

How hard or easy it is for a person to engage in a particular activity

Answer 63

Signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured

Answer 64

Test items or questions that can be answered with only one of two alternate responses, such as true-false, yes-no, or correct-incorrect questions

Answer 65

TEst items or questions with three or more alternative responses, where only one is scored correct or scored as being consistent with a targeted trait or other construct

Answer 66

Developed a group of IRT models; each item on the test is assumed to have an equivalent relationship with the construct being measured by the test;

Answer 67

Helps the test developer build an adequate measuring instrument Helps the test user select a suitable test Its usefulness does not end with test construction and selection

Answer 68

SEM; provides a measure of the precision of an obsered test score; provides an estimate of the amount of error inherent in an obsered score or measurement; inverse relationship between SEM and reliability of a test; the higher the reliability of a test (or individual subtest within a test) the lower the SEM; tool used to estimate or infer the extent to which an observed score deviates from a ture score; standard deviation of a theoretically normal distribution of test scores obtained by one person on equivalent tests

Answer 69

Another term for Standard Error of Measurement; Index of the extent to which one's individual's scores vary over tests presumed to be parallel

Answer 70

Range or band of test scores that is likely to contain the true score

Answer 71

A statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant

Answer 72

How did this individual's performance on test 1 compare with his or her performance on test 2? How did this individual's performance on test 1 compare with someone else's performance on test 1? How did thisindividual's performance on test 1 compare with someone else's performance on test 2?

Chapter 5: Reliability Flashcards

(96 cards)