Chapter 5 Flashcards
refers to something that produces similar results
reliability
a statistic that quantifies reliability which ranges from 0 to 1
reliability coefficient
what is the score for not reliable at all
zero (0)
what is the score for perfectly reliable
one (1)
the individual’s score on a measure if there was no error
true score
a person’s standing on the theoretical variable independent of any particular measurement
construct score
______ : reliability :: _____ : validity
true score; construct score
formula for observed score
X = T + E
refers to the difference between the observed score and the true score
error
standard deviation squared
variance
variance equals _____ plus _____
true variance; error variance
the proportion of the total variance attributed to true variance
reliability
percentage of true variance
67%
percentage of error due to test construction
18%
percentage of administration error
5%
percentage of unidentified error
5%
percentage of scorer error
5%
refers the inherent uncertainty associated with any measurement, even after care has been taken to minimize preventable mistakes
measurement error
consists of unpredictable fluctuations and inconsistencies of other variables in the measurement process
random error
typically proportionate to what is presumed to be the true value of the variable being measured
systematic error
sources of error variance
- test construction
- test administration
- test scoring and interpretation
what are the variables under test administration source of error variance
- test taker variables
- examiner-related variables
sources of error variance: variation may exist within items in a test or between tests
test construction
sources of error variance: may stem from the test environment
test administration
sources of error variance: pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medication
test taker variables
sources of error variance: physical appearance and demeanor may play a role
examiner-related variables
sources of error variance: computer testing reduces error in test scoring, but many tests still require expert interpretation
test scoring and interpretation
reliability estimates
- test-retest reliability
- split-half reliability
- inter-scorer reliability
an estimated of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test
test-retest reliability
true or false: in test-retest reliability, as time passes, correlation between the scores obtained on each testing increases
false; decreases
this reliability is most appropriate for variables that should be stable over time, such as personality
test-retest reliability
with intervals greater than 6 months, the estimate of test-retest reliability is called _____
coefficient of stability
measures the degree of the relationship between various forms of a test by means of alternate-forms or parallel-forms
coefficient of equivalence
for each form of the test, the means and the variances of observed test scores are equal
parallel forms
typically designed to be equivalent with respect to variables such as content and level of difficulty
alternate forms
obtaining estimates of alternate-forms reliability and parallel-forms reliability is similar to obtaining an estimate of _____
test-retest reliability
obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once
split-half reliability
step 1 of split-half reliability
divide the test into equivalent halves
step 2 of split-half reliability
calculate a pearson r between scores on the two halves of the test
step 3 of split-half reliability
adjust the half-test reliability using the spearman-brown formula
allows a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test
spearman-brown formula
the degree of correlation among all the items on a scale
inter-item consistency
mean of all possible split-half correlations
coefficient alpha
range of coefficient alpha
0 to 1
coefficient alpha is corrected by what formula
spearman-brown formula
the degree of agreement or consistency between two or more scorers with regard to a particular measure
inter-scorer reliability
what reliability is often used when coding nonverbal behavior
inter-scorer reliability
a correlation coefficient used to determine the degree of consistency among scorers
coefficient of inter-scorer reliability
the nature of tests
- homogeneity vs. heterogeneity of items
- dynamic vs. static characteristics
- restriction of range vs. inflation of range
- speed test vs. power test
estimates the portion of a test score that is attributable to error
true score theory or classical test theory
averaging all the observed scores obtained over a period of time, the result would be closest to the true score
true score theory or classical test theory
the greater the number of items, the higher the reliability
true score theory or classical test theory
estimates the extent to which specific sources of variation under defined conditions are contributing to the test score
domain sampling theory
assumes that the items that have been selected for any one test are just a sample of items from an infinite domain of potential items
domain sampling theory
based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation
generalizability theory
it is described in terms of its facets
universe
provides a way to model the probability that a person with x ability will be able to perform at a level of y
item response theory
refers to a family of methods and techniques used to distinguish specific approaches
item response theory
incorporates considerations of an item’s level of difficulty and discrimination
item response theory
relates to an item not being easily accomplished, solved, or comprehended
difficulty
refers to the degree to which an item differentiates among people with higher or lower levels of the variables being measured
discrimination
the _____ the reliability of the test, the _____ the standard error
higher; lower
the higher the _____ of the test, the lower the _____
reliability; standard error
can be used to estimate the extent to which an observed score deviates from a true score
standard error
a range or band of test scores that is likely to contain the true score
confidence interval
percent between +-1 sd
68.3%
percent between +-2 sd
95.4%
percent between +-3 sd
99.7%
percent 1 sd from the mean
34.1%
percent 2 sd from the mean
13.6%
percent 3 sd from the mean
2.1%
when a time limit is long enough to allow test takers to attempt all items, and if some items are so difficult that no test taker is able to obtain a perfect score
power test
generally contains items of uniform level of difficulty so that, when given generous time limits, all test takers should be able to complete all the test items correctly
speed test
designed to provide an indication of where a test taker stands with respect to some variable or criterion
criterion-referenced test
examines how generalizable scores from a particular test are if the test is administered in different situations
generalizability study
error that is unpredictable
random error
error that is expected so you have prepared for it
systematic error
summary of test-retest reliability
test-retest = different administrations - coefficient of stability
summary of parallel/alternate forms
parallel/alternate forms = different forms - coefficient of equivalence
summary of split-half reliability
split-half reliability = different halves of the test - person r (correlation) & spearman brown (adjustment)
score for reliability
0.8 and above
all items measure only one construct
homogenous items
all items measures lots of constructs
heterogenous items
in which items is internal consistency high
homogenous items
what coefficient removes biases in scoring
coefficient of inter-scorer reliability
analyze correlation at a specific range; high reliability
restriction of range
looks at the whole picture; low reliability
inflation of range
what is measured in power test
ceiling and floor limits
highest that you can analyze
ceiling limits
lowest that you can understand
floor limits