Unit 5 (from quizlet) Flashcards
based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation. Given the exact conditions of all the facets in the universe, the exact same test score should be obtained.
Generalizability theory
an estimate of reliability obtained by correlating pairs of scores from the sample people on two different administrations of the same test
Test-retest reliability
The degree of correlation among all items on a scale.
Calculated from a single administration of a single form of a test - useful in assessing homogeneity of the test.
Inter-item consistency
What are IRT models?
Rasch model, dichotomous test items, and polytomous test items
a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences
Dynamic characteristic
The relationship between SEM and the reliability of the test is […];
inverse
the higher the reliability of the test, the lower the SEM
A study examining how generalizable scores from a particular test are if the test is administered in different situations. It examines how much of an impact different facets of the universe have on the test score.
Generalizability study
potential sources of error variance.
The examiner’s physical appearance and demeanor are some factors for consideration here. On an oral examination, some examiners may unwittingly provide clues by emphasizing key words as they pose questions.
Examiner-related variables
a value that according to CTT, genuinely reflects an individual’s ability (or trait) level as measured by a particular test
True score
Sources of error variance
test construction,
administration,
scoring, and/or interpretation
interviews may not have been trained properly, the wording amy have been ambiguous, or the items may have somehow been biased.
methodological error
pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medications. A test taker may make a mistake in entering a test response
Testtaker variables
What does error refer to?
The component of the observed test score that does not have to do with the testtaker’s ability.
all of the factors associated with the process of measuring some variable, other than the variable being measured
Measurement error
if then variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be lower
Restriction of range
Could also be used to determine the number of items needed to attain a desired level of reliability
Spearman-Brown formula
assign odd-numbered items to one half of the test and even-numbered items to the other half.
Odd-even reliability
a test containing items of uniform level of difficulty so that, when given generous time limits, all test takers should be able to complete all the test items correctly
Speed test
What problems with CTT?
- all items are presumed to be contributing equally to the score total.
- CTT favors the development of longer rather than shorter tests
It is a useful measure of reliability when it is impractical or undesirable to assess reliability with two tests or to administer a test twice.
Split-half reliability
an index of reliability, a proportion that indicates the ratio between the true score variance and the total variance.
Reliability coefficient
What does the Spearman-Brown formula allow?
a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.
a source of error measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.
Systematic error
formula for error
X=T+E
(x=observed score, T represents true score, E represents error)
A reliability estimate of a speed test should be based on performance from two independent testing periods using what?
1) test-retest reliability
2) alternate-forms reliability
3) split-half reliability from two separately timed half test.
- if a split-half procedure is used, the obtained reliability coefficient is for half test and should be adjusted using the Spearman-Brown formula
it provides a measure of the precision of an observed test score. It provides an estimate of the amount of error inherent in an observed score or measurement
Standard error of measurement (SEM)
variance from true differences
True variance
- if two scores each contain error such that in each case the true score could be higher or lower, then we would want the two scores to be further apart before we conclude that there is a significant difference between them.
Standard error of the difference
A study in which developers examine the usefulness of test scores in helping the test user make decisions
Decision study
a range or band of test scores that is likely to contain the true score
Confidence interval
The probability of endorsing or selecting an item response indicative of higher levels of theta should increase as the underlying level of theta increases
Monotonicity
- typically designed to be equivalent with respect to variances such as content and level of difficulty
Alternate forms reliability
Why is CTT then most widely used model of measurement?
Because of simplicity, especially when one considers the complexity of other proposed models of measurement. Assumptions are rather easily met and therefore applicable to so many measurement situations
It signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured
Discrimination
an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means of variances of observed test scores are equal.
Parallel form reliability
the degree of agreement or consistency between two or more scorers with regard to a particular measure
Inter-scorer reliability
The proportion of the total variance attributed to true variance.
The greater the proportion of total variance attributed to true variance, the more reliable the test.
Reliability
What is a useful feature of IRT?
It enables test users to better understand the range over theta for which an item is most useful in discriminating among groups of test takers.
It is a specific application of a more general formula to estimate the reliability of a test that is lengthened or shortened by a number of items.
Spearman-Brown formula
a test designed to provide an indication of where a test taker stands with respect to some variable or criterion, such as an educational or vocational objective.
- should contain material that has been mastered in a hierarchal fashion.
Criterion referenced test
a test’s reliability is conceived of as an objective measure of how precisely the score assesses the domain from which the test draws a sample
Domain sampling theory
intelligence-obtained measurement would not be expected to vary significantly as a function of time, and either the test-retest or alternate forms method would be appropriate
Status characteristic
What is the Kuder-Richardsom formula 20
the statistic of choice for determine the inter-item consistency of dichotomous items, primarily those items that can be scored right or wrong.
- if the test is heterogenous, KR-20 will yield lower reliability estimates than the split-half method
A reliability estimate is based on the correlation between scores on two halves of the test and is then adjusted using the Spearman-Brown formula
Split-half reliability
a test containing items that measure a single trait.
Homogeneous test
- appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time
Test-retest reliability
What are the three approaches to the estimation of reliability?
1) test-retest
2) alternate or parallel forms
3) internal or inter-item consistency
the degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel forms.
Coefficient of equivalence
an estimate of the extent to which these different forms of the same test have been affected items sampling, or other error.
Alternate forms reliability
the procedures of this theory provide a way to model the probability that person with X ability will be able to perform at a level of Y.
Item response theory (IRT) (latent-trait theory)
A statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant.
Standard error of the difference
What is local independence?
means that a) there is a systematic relationship between all of the test items and b) that relationship has to do with theta level of the testtaker
the preferred statistic for obtaining an estimate of internal consistency reliability
Cronbach’s alpha
the extent to which the population of voters in the study actually was representative of voters in the election.
Researchers may have gotten factors right, but did not include enough people in their sample to draw the conclusion that they did.
Sampling error
if the variance of either variable in a correlational analysis is inflated by the sampling procedure, then the resulting correlation coefficient tends to be higher
Inflation of range
correlating pairs of scores obtained from equivalent halves of a single test administered once.
Split-half reliability
What is the Rasch model?
a reference to an IRT model with very specific assumptions about the underlying distribution.
(each item is assumed to have an equivalent relationship with the construct being measured by the test)
a measure that focuses on the degree of difference that exists between item scores.
- The APD index is not contacted to the number of items on a measure
- the APD index is not connected to the number of items on a measure
Average proportional distance
the true score (or classical) model of measurement. It is the most widely used and accepted model in the psychometric literature today.
Classical test theory (CTT)
a test where some items are so difficult that no test taker is able to obtain a perfect score
Power test
It represents the influence of particular facets on the test score
Coefficient of generalizability
a test composed of items that measure more than one trait
Heterogenous test
- a reliability estimate is based on the correlation between the two total scores on the two forms
Alternate forms reliability
the simplest way to determine the degree of consistency among scorers in the scoring of a test
Coefficient of inter-scorer reliability
variance from irrelevant, random sources
Error variance
the tool used to estimate or infer the extent to which an observed score deviates from a true score
standard error of measurement (SEM)
test items or questions that can be answered with only one of two alternative responses
Dichotomous test items
test items or questions with three or more alternative responses, where only one is scored correct or scores as being consistent with a targeted trait or other construct
Polytomous test items
What are the assumptions of using IRT?
1) unidemensionality - points that the set of items measures a single continuous latent construct. This construct is referred to as theta. Theta is a reference to the degree of the underlying ability or trait the test taker is presumed to bring to the test
2) local independence
3) monotonicity
What is the difference between CTT and domain sampling theory?
CTT - seek to estimate the portion of a test score that is attributable to error
Domain Sampling Theory - seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score
When the interval between testing is greater than six months, it is the estimate of test-retest reliability
Coefficient of stability
a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process. ex: noise
Random error