Reliabiity, Validity Flashcards
assumes that each person has a true score that would be obtained if there were no errors in measurement
classical test score theory
assumes that the items that have been selected for any one test are just a sample of items from an infinite domain of potential itesms
domain sampling theory
process of choosing test items that are appropriate to the content domain of the test
domain sampling
another central concept in classical test theory wherein it considers the problems created by using a limited number of items to represent a larger and more complicated construct
domain sampling model
using ____, the computer is used to focus on the range of item difficulty that helps assess an individual’s ability level
item response theory
degree to which scores from a test are stable and results are consistent
reliability
ratio of the variance of the true scores on a test to the variance of the observed scores
reliability coefficient
test reliability is usually estimated in one of three ways:
- test-retest method
- parallel forms method
- internal consistency method
consistency of the test results are considered when the test is administered on different occasions
test-retest method
test across different forms of the test are evaluated
parallel forms method
performance of people on similar subsets of items selected from the same form of measure is examined
internal consistency
occurs when the first testing session influences scores from the second session
carryover effect
compares two equivalent forms of a test that measure the same attribute
parallel forms / equivalent forms reliability
determined by dividing the total set of items relating to a construct of interest into halves and comparing the results obtained from the two subsets of items thus created
split-half reliability
measure of internal consistency; considered to be a measure of scale reliability
coefficient alpha or cronbach’s alpha
used to estimate the reliability of binary measurements
kuder and rischardshon formula 20
takes into account chance agreement
kappa statistics
allows you to estimated what the correlation between the two halves would have been if each half had been the length of the whole test
spearman-brown formula
best method for assessing the level of agreement among several observers
kappa statistic
agreement between a test score or measure and the quality it is believed to measure
validity
3 types of evidence:
- construct-related
- criterion-related
- content-related
mere appearance that a measure has validity
face validity
logical rather than statistical
content validity
describes the failure to capture important components of a construct
construct underepresentation
occurs when scores are influenced by factors irrelevant to the construct
construct-irrelevant variance
tells us just how well a test corresponds with a particular criterion
criterion validity evidence
standard against which the test is compared
criterion
SAT is the predictor and GPA is the criterion
predictive validity evidence
correlation expressing the relationship between a test and a criterion
validity coefficient
established through a series of activities in which a researcher simultaneously defines some construct and develops the instrumentation to measure it
construct validity evidence
obtained when a measure correlates well with other tests to measure the same construct
convergent evidence for validity
standardized tests that are designed toc compare and rank test takers in relation to one another
norm-referenced test
process of evaluating the learning of students against a set of pre-specified qualities or criteria, without reference to the achievement of otehrs
criterion referenced test