Ch. 5 - Reliability Flashcards by Anna Whitney

reliability

consistency in measurement (not good or bad, right or wrong, just consistent); the proportion of the total variance attributed to true variance

How well did you know this?

Not at all

Perfectly

reliability coefficient

a proportion that indicates the ratio between the true score variance on a test and the total variance

How well did you know this?

Not at all

Perfectly

concept of reliability - equation

Observed Score = True Score + Error

How well did you know this?

Not at all

Perfectly

we use X to describe test score variability / reliability

variance

How well did you know this?

Not at all

Perfectly

the proportion of the total variance attributed to true variance is

reliability

How well did you know this?

Not at all

Perfectly

the greater the reliability…

indicates that you are capturing more true variance than “noise”

How well did you know this?

Not at all

Perfectly

measurement error

all of the factors associated with the process of measuring some variable, other than the variable being measured

How well did you know this?

Not at all

Perfectly

error variance

variance from irrelevant, random sources

How well did you know this?

Not at all

Perfectly

sources of error variance

test construction (content sampled, way items are worded
test administration (environment: lighting, temperature; testtaker variables: sick, bad mood; examiner-related variables: "giving away" answers with tone of voice)

How well did you know this?

Not at all

Perfectly

more sources of error variance

computer glitches or errors in hand-scoring; testtakers may over or under report
sampling error - only contacting voters with landlines

How well did you know this?

Not at all

Perfectly

test-retest reliability

a method of reliability. obtained by correlating pairs of scores from the same people on two different administrations of the same test. use when measuring something that’s stable over time (trait)

How well did you know this?

Not at all

Perfectly

as the time between test administrations increases, the correlation usually…

decreases

How well did you know this?

Not at all

Perfectly

coefficient of stability

the estimate of test-retest reliability, when the interval between testing is greater than six months

How well did you know this?

Not at all

Perfectly

coefficient of equivalence

the degree of the relationship between various forms of a test

How well did you know this?

Not at all

Perfectly

parallel forms (reliability)

for each form of the test, the means and variances of observed test scores are equal

How well did you know this?

Not at all

Perfectly

alternate forms (reliability)

these don’t necessarily met the requirements of parallel forms (same means and variances) but are equivalent in terms of content, level of difficulty, etc

How well did you know this?

Not at all

Perfectly

parallel or alternate forms relaibility

the extent to which item sampling and other errors have affected test scores on versions of the same test

How well did you know this?

Not at all

Perfectly

how do you obtain parallel or alternate forms reliability estimates?

administer test two times with same group (like test-retest but don’t have to wait)
same problems: scores affected by item sampling, testtaker variables, etc
time consuming and expensive

How well did you know this?

Not at all

Perfectly

estimate of inter-item consistency

degree of correlation among all items on a scale

How well did you know this?

Not at all

Perfectly

how do you do a split-half reliability estimate?

(1) divide test into equivalent halves
(2) find Pearson r between the scores on each half
(3) adjust the half-test reliability with Spearman-Brown formula

How well did you know this?

Not at all

Perfectly

what is a split-half reliability estimate?

obtaining reliability estimate evaluating the internal consistency of the test (no need for two firms or time elapsing).

How well did you know this?

Not at all

Perfectly

how should split the test for a split-half reliability estimate?

not down the middle
randomly assign items
split odd-even
divide by content and difficulty

i.e. make mini parallel forms!

How well did you know this?

Not at all

Perfectly

Spearman-Brown Adjustment

Study These Flashcards

determines the reliability of a whole test from a shortened version. (half)

don’t use split-half reliability with what kind of test?

Study These Flashcards

heterogeneous (measures more than one trait)

reliability usually increases as...

test length increases

alternatives to the Spearman-Brown reliability estimate (for split-half)

Kuder-Richardson (for tests with dichotemous items) Average Proportional Distance Cronbach's alpha - "mean of all possible split-half correlations"

reliability coefficients range from

0 to 1. possible to get negative, but usually a mistake in data entry

measures of reliability are subject to

error. they are estimates

a reliability coefficient may not be acceptable if

it is done with the same test on a very different set of testtakers

what's a good reliability?

like grades! .90 is an A, .80 is a B

if reliability is really high on a split-half estimate, what is likely the cause?

redundancy in test items

the more homogeneous a test is...

the more inter-item consistency it can be expected to have (duh)

split-half reliability, odd-even, Spearman-Brown formula, Kuder-Richardson (KR-20), alpha, and Average Proportional Distance are all methods of evaluating...

the internal consistency of a test

inter-scorer reliability

the degree of agreement or consistency between two or more scorers/judges/raters

if inter-scorer reliability is high,...

test scores can be derived in a systematic, consistent way by trained scorers

what are the three approaches for estimating reliability?

test-retest, alternate or parallel forms, internal or inter-item consistency

what about the nature of a test might influence reliability? (5)

``` homogeneous vs heterogeneous test dynamic vs static characteristics restriction or inflation of range speed vs power test criterion-referenced vs norm-referenced tests ```

heterogeneous vs homogeneous test

measures different factors; measures one factor/trait

traditional ways of estimating reliability are often not appropriate for what kind of test?

criterion-referenced

what kind of reliability estimate is best for a heterogeneous test?

test-retest (not inter item consistency because that will be low)

what kind of reliability estimate is best for a measurement of dynamic characteristics?

inter-item consistency (not test-retest)

power test

has a long time limit, but some items are so hard that no testtaker will get a perfect score

speed test

must be done in a certain amount of time. easy items but tough to get them all done (typing)

classical test theory believes that...

everyone has a "true score" on a test. very test-dependent, though

what are alternatives to classical test theory?

domain sampling theory generalizability theory Item Response Theory (IRT)

domain sampling theory

test's reliability is an objective measure of how precisely the test measures the "domain" of the test (ex: behavior). takes issue with the true score + error = score

generalizability theory

a person's test scores vary from testing to testing because of the variables in the testing sitaution. takes issue with the true score + error = score

Item Response Theory (IRT)

hundreds of varietys; items vary in many different ways including: Difficulty and Discrimination

what tells us how much error could be in single test score?

Standard Error of Measurement (SEM)

Standard Error of Measurement

estimates the extent to which an observed score deviates from a "true" score

the higher reliability of a test, the ____ the SEM

lower

if a person were to take a bunch of equivalent tests, scores would be...

normally distributed with their true score at the mean

confidence interval

the range or band of scores that is likely to contain the true score

95% confidence interval - what does it mean?

we are 95% confident that the true score is within +- 2 standard errors of measurement. 95% of this testtaker's scores are expected to fall within this range on the distribution

true differences in a characteristic being measured might be from another source besides error or change from one testing to another. what might that be?

an actual difference. might be what you're looking for in psychotherapy outcome reasearch

standard error of the difference helps you determine

if your research showed statistically significant results of something weird!

the standard error of the difference will always be ___ compared to the standard error of measurement for a score.

larger, because both include error.

Ch. 5 - Reliability Flashcards

(57 cards)