W2 - Chapter 5 - Reliability (DN) Flashcards

Question

inflation of range/variance

Answer 1

- SAMPLING PROCEDURES may impact the variance of either variable in a correlation analysis OUTCOME - if variance of EITHER variable is INFLATED by sampling procedure then the resulting CC tends to be HIGHER (i.e., giving a false indicator of correlation (thought to self - is this also a validity issue e.g., false positive) - conversely referred to as RESTRICTION OF RANGE/VARIANCE - if variance of EITHER variable is RESTRICTED by sampling procedure used, then tends to be a LOWER CORRELATION COEFFICIENT (i.e., masking true correlation) (thought to self - is this also a validity issue e.g., failing to detect - a miss!!!) p.162

Answer 2

- an IRT TOOL - helps test users to determine the RANGE OVER THETA for which an item is most useful in DISCRIMINATING among groups of testtakers p. 171

Answer 3

- the CONSISTENCY or HOMOGENEITY of ALL items on a test - ESTIMATED by techniques such as the SPLIT-HALF RELIABILITY method - the DEGREE of CORRELATION among ALL ITEMS on a scale - p.154

Answer 4

an ESTIMATE of the RELIABILITY of a test | - obtained from a MEASURE of INTER-ITEM CONSISTENCY p.152

Answer 5

- An ESTIMATE of the DEGREE of agreement or CONSISTENCY between TWO or more SCORERS on a test. - also referred to as INTER-RATER reliability; OBSERVER reliability; JUDGE reliability; SCORER reliability. - p.159, 161

Answer 6

- graphic representation of the PROBABILISTIC RELATIONSHIP between a person's LEVEL of TRAIT (ability, characteristic) being measured and the PROBABILITY for responding to an item in a PREDICTED way; - also known as a CATEGORY RESPONSE CURVE, or, an ITEM TRACE LINE p. 177, 281

Answer 7

- another alternative to the true score model - a family of theories/methods (well over 100 varieties of IRT models) - each model is designed to HANDLE data with CERTAIN ASSUMPTIONS - a way of modelling (predicting?) the PROBABILITY that a person with X ability will be able to perform at a LEVEL OF Y. - also referred to as LATENT-TRAIT MODELp. p. 166, 168-173

Answer 8

- one source of VARIANCE in the measurement process is the VARIATION among items WITHIN a test, or BETWEEN tests i.e., the way in which a test is CONSTRUCTED is a source of ERROR VARIANCE - also CONTENT SAMPLING p. 147

Answer 9

a series of EQUATIONS developed by G. F Kuder & M. W. Richardson - designed to ESTIMATE the INTER-ITEM CONSISTENCY of tests - only appropriate for use on tests with DICHOTOMOUS ITEMS (true/false) p. 155-156, 163

Answer 10

- a synonym for IRT (Item Response Theory) in the academic literature - a system of ASSUMPTIONS about measurement - includes ASSUMPTION that a TRAIT being measured is UNIDIMENSIONAL - go back and check this pg 168 - the extent to which each test item measures the targeted trait - also referred to as LATENT-TRAIT MODELp. 168

Answer 11

all factors associated with the PROCESS of measuring some variable OTHER than the actual variable being measured p.146

Answer 12

- an ESTIMATE of the SPLIT-HALF RELIABILITY of a test | - Splitting a test by assigning odd-numbered items to one half & even-numbered items to the other half of the test p.153

Answer 13

when on each FORM of the test, the MEANS & VARIANCES of OBSERVED TEST SCORES are EQUAL .151

Answer 14

- an estimate of the consistency of two versions of a test across time - an ESTIMATE of the extent to which ITEM SAMPLING & OTHER ERRORS have affected test scores on versions of the SAME test, for which MEANS & VARIANCES of OBSERVED TEST SCORES are EQUAL. (contrast with alternate forms reliability & also coefficient of equivalence) p.151-152

Answer 15

a test item or question with THREE OR MORE ALTERNATIVE RESPONSES - where ONLY ONE is scored CORRECT or is CONSISTENT with a TARGETED TRAIT or other CONSTRUCT p. 169

Answer 16

- a test, usually of achievement or ability has 1) either NO TIME LIMIT or such a long time limit that ALL TESTAKERS can attempt ALL ITEMS 2) some items are SO DIFFICULT that NO TESTTAKER can obtain a PERFECT SCORE (so its isolating the 'power' or 'ability' variable) (contrast with speed test) p.163

Answer 17

a source of ERROR when measuring a target variable due to UNPREDICTABLE FLUCTUATIONS & INCONSISITENCIES of OTHER VARIABLES in the measurement process - sometimes referred to as "NOISE" - contrast with systematic error p.146

Answer 18

a reference to an IRT MODEL with VERY SPECIFIC ASSUMPTIONS about the UNDERLYING DISTRIBUTION p.169

Answer 19

the proportion of the total variance attributable to TRUE VARIANCE - the GREATER the proportion of TRUE VARIANCE = the GREATER the RELIABILITY of a test - p.157-158

Answer 20

- general term - an INDEX of RELIABILITY - or the RATIO of TRUE SCORE VARIANCE to TOTAL SCORE VARIANCE on a test p. 145

Answer 21

- SAMPLING PROCEDURES may impact the variance of either variable in a correlation analysis OUTCOME - if variance of EITHER variable is RESTRICTED by sampling procedure used, then tends to be a LOWER CORRELATION COEFFICIENT (i.e., masking true correlation) (thought to self - is this also a validity issue e.g., failing to detect - a miss!!!) - conversely referred to as INFLATION OF RANGE/VARIANCE - if variance of EITHER variable is INFLATED by sampling procedure then the resulting CC tends to be HIGHER (i.e., giving a false indicator of correlation (thought to self - is this also a validity issue e.g., false positive) p.162

Answer 22

allows a test developer/user to estimate the INTERNAL consistency reliability from a correlation of TWO HALVES of a test that has been LENGTHENED or SHORTENED. - inappropriate for use with HETEROGENEOUS tests or SPEED tests p. 153-154

Answer 23

- a test, usually of achievement or ability which has a TIME LIMIT - usually contains ITEMS of UNIFORM difficulty (usually uniformly low) - so that when given GENEROUS TIME ALL TESTTAKERS should be able to complete ALL ITEMS CORRECTLY (so its isolating the SPEED variable) (contrast with 'power test') p.163, 272

Answer 24

an ESTIMATE of the INTERNAL CONSISTENCY of a test - obtained by CORRELATING two PAIRS of SCORES taken from EQUIVALENT HALVES of a SINGLE TEST administered ONCE - p.152- 154

Answer 25

- in TRUE SCORE THEORY - a STATISTIC designed to ESTIMATE how far an OBSERVED SCORE DEVIATES from a TRUE SCORE (also called standard error of measurement (SEM) p.175

Answer 26

- in TRUE SCORE THEORY - a STATISTIC designed to ESTIMATE how far an OBSERVED SCORE DEVIATES from a TRUE SCORE (also called STANDARD ERROR OF A SCORE) p.132, 175-178

Answer 27

- a STATISTIC designed to aid in determining HOW LARGE a DIFFERENCE between two scores should be BEFORE it is considered STATISTICALLY SIGNIFICANT p. 132, 178

Answer 28

a TRAIT, STATE or ABILITY presumed to be relatively STATIC OVER TIME (contrast with dynamic characteristic) p.162

Answer 29

- a source of ERROR in the measurement process - typically CONSTANT or PROPORTIONATE to what is presumed to be the TRUE VALUE of the target variable being measured - once known, it is predictable & FIXABLE - relative standings remain unchanged - may not be VALID but is RELIABLE - p. 146

Answer 30

typically composed of TESTS designed to measure DIFFERENT VARIABLES. - quite often psychologists rely on a BATTERY of tests in the process of EVALUATION. p. 155n5, 502-504 see also specific batteries

Answer 31

an estimate of reliability obtained by CORRELATING pairs of scores from the SAME PEOPLE on TWO DIFFERENT administrations of the test - appropriate when EVALUATING the RELIABILITY of a test purporting to measure something relatively STABLE over TIME e.g., a personality trait p.150-151, 161

Answer 32

- a reference to the DEGREE of the underlying ability or trait that a TESTTAKER is presumed to BRING TO the test - also referred to as THETA p. 170

Answer 33

a source of error attributable to the testtaker's FEELINGS, MOODS, or MENTAL STATE OVER TIME p.160

Answer 34

- according to CLASSICAL TEST THEORY | - a value that GENUINELY reflects an individual's ABILITY or TRAIT level as measured by a particular test p.164

Answer 35

- in the TRUE SCORE MODEL - the COMPONENT of a score attributable to TRUE DIFFERENCES in the ability or trait being measured - can be in an OBSERVED SCORE or a DISTRIBUTION of SCORES p.146

Answer 36

- in GENERALIZABILITY THEORY - the TOTAL CONTEXT of a particular test situation - including ALL the FACTORS that lead to an individual testtakers score - p.167

Answer 37

- in GENERALIZABILITY THEORY - a test score corresponding to the PARTICULAR UNIVERSE being assessed or evaluated p. 167

Answer 38

a statistic useful in describing SOURCES of test score variability - equal to the MEAN of the SQUARES of the DIFFERENCES between SCORES in a distribution and THEIR MEAN - calculated by SQUARING & SUMMING all the DEVIATION SCORES then DIVIDING by the total number of scores p.95, 146

Answer 39

to MAXIMIZE the proportion of TOTAL VARIANCE that is TRUE VARIANCE and to MINIMIZE the proportion that is ERROR VARIANCE - p.147

Answer 40

1) TEST CONSTRUCTION - item sampling/content sampling 2) TEST ADMINISTRATION - test environment; testtaker variables; examiner related variable. 3) TEST SCORING and INTERPRETATION - scorers; scoring systems). 4) OTHER SOURCES OF ERROR - sampling error - methodological error - researchers not trained, ambiguous wording, item biases) p. 147 - 149

Answer 41

1) SPEARMAN-BROWN FORMULA p.153-4 2) KUDER-RICHARDSON FORMULAS p.155-6 3) COEFFICIENT ALPHA p.157 4) AVERAGE PROPORTIONAL DISTANCE (APD) p.157

Answer 42

1) Two test administrations with the SAME GROUP are required 2) Test scores between tests may be AFFECTED by factors such as MOTIVATION, FATIGUE, or INTERVENING EVENTS (practise, learning or therapy) - although not as much as if the EXACT SAME test had been administered twice

Answer 43

ITEM SAMPLING ERROR | p. 152

Answer 44

Step 1 - Divide the test into EQUIVALENT HALVES Step 2 - calculate a Pearson r between scores on the TWO HALVES of the test STEP 3 - adjust the HALF-TEST reliability using the SPEARMAN-BROWN FORMULA (p.152-153)

Answer 45

Ca - 0-1 Pr - -1 to +1 Ca - gauging how SIMILAR data sets are PR - dealing with SIMILARITY & DISSIMILARITY

Answer 46

APD - focus is on the DEGREE of DIFFERENCE between item scores SH & CA - focus is on SIMILARITIES between item scores p.157

Answer 47

1) test-retest 2) alternate or parallel forms 3) internal or inter-item consistency method chosen will depend on a number of factors - e.g., the PURPOSE, NATURE for obtaining the measure p. 160

Answer 48

- the method chosen will depend on a number of factors - e.g., the PURPOSE, NATURE for obtaining the measure NOTE: the various RELIABILITY COEFFICIENTS DO NOT all reflect the same SOURCES of ERROR VARIANCE see pg. 161 (impt to understand why each test is selected, also refer to Table 5-4)

Answer 49

1) Unidimensionality 2) Local Independence 3) Monotonicity p. 170

W2 - Chapter 5 - Reliability (DN) Flashcards

(73 cards)