Chapter 5 Flashcards
Potentially start earlier in chapter
check
Classical test theory/true score theory
A system of assumptions about measurement that includes the notion that a test score and a response to an individual item is composed of a relatively stable component that actually is what the test or individual item is designed to measure, as well as a component that is error
True score
According to CTT, a value that genuinely reflects an individual’s ability level as measured by a particular test- is test dependent
Advantages of CTT
Assumptions easy to meet, simple, compatible and easy to use with widely used statistical techniques
Problems with CTT
All items presumed to be contributing equally to score- questionable assumption, assumptions favor longer tests
Domain sampling theory
A system of assumptions about measurement that includes the notion that a test score consists of a relatively stable component that actually is what the test (or item) is designed to measure as well as relatively unstable components that collectively can be accounted for as error
??
Generalizability theory
Based on idea that person’s test scores vary from testing to testing because of variables in the testing situation
Universe
Total context of a particular test situation, including all factors that lead to an individual test taker’s score
Facets
Variables of interest in the universe, ie number of items in the test, amount of training test scorers have, purpose of test administration
Universe score
Test score that should be obtained given the exact same conditions of all the facets in the universe
Generalizability study
Examines how generalizable scores from a particular test are if the test is administered in different situations, examines how much of an impact different facets of the universe have on the test score
Coefficients of generalizability
Influence of particular facets on the test score
Decision study
Examine usefulness of test scores in helping the test user make decisions
Item response theory / latent-trait theory
A family of theories and methods that provide a way to model the probability that a person with X ability will be able to perform at a level of Y (with X amount of trait, will exhibit Y amount of trait on test designed to measure it); a system of assumptions about measurement including that a trait being measured is unidimensional and extent to which each test item measures the trait??
Discrimination
In context of IRT, the degree to which an item differentiates among people with higher or lower levels of what is being measured
Dichotomous test items
Test items that can be answered with only one of two alternative responses
Polytomous test items
Test items with three or more alternative responses, where only one is scored correct or consistent with a targeted construct
Assumptions in latent trait models
Something about underlying frequency distribution of test scores?
Rasch model
IRT model with specific assumptions about underlying distribution
Standard error of measurement
Measure of precision of an observed test score, provides an estimate of the amount of error inherent in an observed score or measurement, inverse relationship with reliability of test; tool used to estimate the extent to which an observed score deviates from a true score, aka standard error of a score, index of the extent to which one individual’s scores vary over tests presumed to be parallel
do I need to know this formula
Confidence interval
A range or band of test scores that is likely to contain the true score
Standard error of the difference
A statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant
formula?