Reliability and Validity Flashcards
Assumes that each person has a true score that would be obtained if there were no errors in measurement.
Classical Test Score Theory
assumes that the items that have been selected for any one test are just a sample of items from an infinite domain of potential items
Domain Sampling Theory
the process of choosing test items that are appropriate to the content domain of the test
domain sampling
this model considers the problems created by using a limited number of items to represent a larger and more complicated construct
the domain sampling model
using this, the computer is used to focus on the range of item difficulty that helps assess an individual’s ability level.
item response theory
refers to the degree which scores from a test are stable and results are consistent
reliability
test reliability is usually estimated in one of three ways
- test-retest method
- method of parallel forms
- method of internal consistency
in the ____, we consider the consistency of the test results when the test is administered on different occasions
test-retest method
using the ____, we evaluate the test across different forms of the test
method of parallel forms
we examine how people perform on similar subsets of items selected from the same form of the measure with the _____
method of internal consistency
this effect occurs when the first testing session influences scores from second session
carryover effect
compares two equivalent forms of a test that measure the same attribute
parallel forms / equivalent forms reliability
- a test given and divided into halves that are scored separately
- the results of one half of the test are then compared with the results of the other
split half method
Source of Error and Method for:
same test given at two points in time
Source of Error: Time sampling
Method: Test-Retest Method
correlation between scores obtained on the two occasions
* source of error
* method
Source of Error: Time Sampling
Method: Test-Retest
different items used to assess the same attribute
item sampling
different items used to assess the same attribute
item sampling
correlation between equivalent forms of the test that have different items
item sampling
determined by dividing the total set of items relating to a construct of interest into halves and comparing the results obtained from the two subsets of items thus created
split half reliability
also known as cronbach’s alpha
coefficient alpha
- a measure of internal consistency, that is, how closely related a set of items are as a group
- it is considered to be a measure of scale reliability
coefficient alpha
used to estimate the reliability of binary measurements
KR20 (Kuder and Richardson Formula 20)
- takes into account chance agreement
- defined as (observed agreement - expected agreement)/(1-expected agreement)
Kappa Statistics
best method for assessing the level of agreement among several observers
kappa statistics
value of kappa when two measurements agree only at the chance level
0
value of kappa when two measurements agree perfectly
1.0
range in which reliability estimates are good enough for most purposes in basic research
.70 and .80
what to do about low reliability?
increase the number of items according to the domain sampling model
the ____ the sample, the more likely that the test will represent the true characteristic
larger
- can be applied to correct for half-length
- allows to estimate what the correlation between the two halves would have been if each half had been the length of the whole test
spearman-brown formula
can be defined as the agreement between a test score or measure and the quality it is believed to measure
validity
answers the question, “does the test measure what is supposed to measure?”
validity
3 types of evidence in validity
- construct-related
- criterion-related
- content-related
is the mere appearance that a measure has validity
face validity
the only type of evidence besides face validity that is logical rather than statistical
content validity
describes the failure to capture important components of a construct
construct underrepresentation
occurs when scores are influenced by factors irrelevant to the construct
construct irrelevant variance
tells us just how well a test corresponds with a particular criterion
criterion validity evidence
standard against which the test is compared
criterion
forecasting function of tests is usually a type or form of criterion validity evidence known as ______
predictive validity evidence
the relationship between a test and a criterion is usually expressed as a correlation called _________ __________
validity coefficient
established through a series of activities in which a researcher simultaneously defines some construct and develops the instrumentation to measure it
construct validity evidence
involves assembling evidence about what a test means
construct validation
when a measure correlates well with other tests believed to measure the same construct, ___________ ___________ for validity is obtained
convergent evidence
- also called divergent validation
- demonstration of uniqueness
- to demonstrate for validity, a test should have low correlations with measures of unrelated constructs, or evidence for what the test does not measure
discriminant evidence
refers to the standardized test that are designed to compare and rank test takers in relation to one another
norm-referenced test
the process of evaluating (or grading) the learning of students against a set of pre-specified qualities or criteria, without the reference to the achievement of others
criterion-referenced test
indicates that the measure does not represent a construct other than the one for which it was derived.
discriminant evidence
simple guidelines for item writing
- define clearly what you want to measure
- generate an item pool
- avoid** exceptionally long items**
- keep the level of reading difficulty appropriate for those who will complete the scale
- avoid double-barreled items that convey two or more ideas at the same time
- consider mixing positively and negatively worded items