Impact on individual test score Flashcards
Sources of information to help evaluate test scores
- point estimate
- confidence interval
point estimates of true scores
- observed test scores
- adjusted true score estimate
- -> measurement error is considered
- -> regression to the mean
three factors influence size and direction of discrepancy
- reliability of test scores
- size of differences between observed test score and mean
- direction of differences between original score and mean
confidence intervals
- reflect accuracy of point estimate
- standard error of measurement
- used to compute CI
- precise CI = higher reliability
several ways to quantify associations
- most common - correlation coefficient
- correlation between true scores
- reliabilities of measures
Implications of measurement error
- observed associations < true associations
- attenuation determined by measure reliability
- error constrains maximum association
Reliability, effect sizes and statistical signifcance
- interpret results in context of reliability
- behavioural research emphasises effect size and statistical significance
effect sizes
represent results as a matter of a degree, affected directly by measurement error and reliability
better reliability =
larger observed effect size
statistical significane
- confidence in results
- affected strongly by observed effect
- larger observed effect increases likelihood of significance
measures with poor realiability =
underestimate the true effect sizes = non significant results
Important characteristics considered throughout the process of test construction and refinement
- items means (item difficulty)
- item variances
- item discrimination
item variance
- may be related to item consistency
- absence of variability = absence of consistency
- implications for item-total correlation
item mean
- reflection of variability
- sometimes interpreted in terms of ‘difficulty’
Test item types
- selected response items (objective or fixed response)
- forced choice
- constructed response
Selected response items (objective or fixed response)
- closed ended
- ability tests - multiple choice/true or false/ranking/matching
- personality tests - dichotomous or polytomous
forced choice
- most or least characteristic of oneself
- mainly used in multidimensional personality inventories
constructed response
- open-ended
- stipulations on time, response length, use of materials
Selected response items - positives
- popular and frequently used
- time efficient
- transformation
selected response items - negatives
- more susceptible to problems
- guessing - distort error
- can diminish reliability and validity
- creation is time consuming
constructed time response items - positives
- rich sample
- authentic behaviour sample
constructed time response items - negatives
- scoring
- test length restriction
- response length
item validity
extent to which items accurately differentiate test takers
external
enhances score validity, multifaceted constructs
internal
homogeneity of test increases - unidimensional constructs
item validity statistics require information on -
- item performance
- criterion standing from samples producing item discrimination stats
- index of discrimination statistics
Goals of item response theory
- generate items to provide maximum amount of information
- give examinees items tailored to abilities
- reduce number of items needed and reduce measurement error
shortcomings of CIT
- indexes of item difficulty are group dependent
- scores are test dependent
- score reliability gauged by standard error of measurement
Item information function
contribution an item makes to trait/ability estimation along continuum
test item information
sum of item information functions (akin to CIT score reliability) used to obtain standard scores of estimation
Applications of IRT
- test development and administration
- psychometric properties of cognitive levels
- common in development of ability/achievement batteries
- development of computerised adaptive tests
Item fairness
- many ways items can be biased - can be investigated across population subgroups
- test fairness - more complex
- responsibility to implement fair testing practice
- awareness of universal design