final Flashcards
face validity
does a test appear to measure what it was designed to measure; lay-person judgement
how do content & face validity differ?
content involves systematic and technical analysis
face is more superficial
criterion validity
the extent to which a measure agrees with a gold standard; whether it matches a measure of some attribute or outcome that is of primary interest (criterion)
types of studies: criterion validity
predictive vs concurrent
predictive studies
take the test today and test the criterion some time down the road
drawbacks to predictive validity studies
time, money, issues from time lag
concurrent studies
test and criterion done at the same time
when should you use predictive vs concurrent studies?
if goal is prediction -> predictive
if goal is to determine current status -> concurrent
criterion contamination
when criterion measures more dimensions than by those by test
do scores on the predictor influence criterion scores?
techniques for interpreting validity coefficients
(1) sig level; did not occur by chance (p value)
2) coefficient of determination (R^2
what if your validity coefficient is small?
if a test provides info that helps predict criterion performance better than any other existing predictor the test may be useful even if coefficient is relatively small
linear regression
a mathematical procedure that allows us to predict values on one variable if we know values on the other
standard error of estimate
a stat that reflects the average amount of error in our prediction and that allows us to make confidence statement
decision theory models
when tests are used for making decision such as personnel selection; factors others than the correlation between test and criterion are important
decision theory models: selection ratio
proportion of applicants needed to fill position
decision theory models: base rate
proportion of applicants who can be successful candidates
model sensitivity
metric that evaluates ability to predict true positives of each available category
A/A+C
A= true positive C= false negative
model specificity
metric that evaluates ability to predict true negatives of each available category
D/B+D
B= false positive
D=true negative
evaluating validity coefficients
- look for changes in the cause of relationships
- what does the criterion mean?
- review the subject pop of validation study
- be sure sample size was adequate
- never confuse criterion with predictor
- check for restricted range on both predictor and criterion
- review evidence for validity generalization
- consider differential prediction
construct validity
extent to which evidence can be provided that test measures a theoretical construct
Campbell & Fiske’s types of validity evidence
convergent and discriminant
types of convergent evidence
(1) does test measure same thing as other tests used for same purpose
(2) does test correlate with specific variables that we can expect if it is doing its job
validation study
two or more constructs measured in two or more ways
what can validation studies tell us?
convergent and discriminate validity
homogenity and unidimensionality
evidence of validity based on response process
involves an analysis of the fit between the performance and actions the test takes actually engage in and the construct being assessed
e.g., interview, behavioural indicators (RT, eye gaze)
evidence based on consequences of testing
were the intended benefits of testing achieved?
ways of getting evidence of validity
(1) test content
(2) relations to other variables (criterion)
(3) internal structure
(4) response processes
(5) consequences of testing
factor analysis
any of several stat methods describing the interrelationships of a set of variables by stats deriving new variables, called factors, that are fewer in number than the original set of variables
types of factor analysis
exploratory and confirmatory
if alpha is lower than expected, there might be ______ and you might want to do _____
heterogeneity
factor analysis
steps in factor analysis
(1) extraction (how many groups?)
2) rotation (average correlation between items and factor itself
purposes of assessment in education
- how well is a student learning?
- assess whether class, grade, school, district, region is learning content
- method to detect learning problems
- method for identifying giftedness
- determine if child is ready to move to next level
- assess teacher effectiveness
- determine readiness/placement in college, grad school, professional school
- credential exams
achievement test
assess learned information; evaluate the effects of a KNOWN or controlled set of experiences
what type of validity procedures does achievement testing rely on?
heavily on content validation procedures
aptitude test
assess ability to learn something; evaluate the effects of UNKNOWN or uncontrolled experiences
what type of validity procedures does aptitude testing rely on?
heavily on predictive criterion validation procedures
goal of classroom testing
measure the extent to which students have learned the facts, concepts, procedures, and skills that have been taught
effective classroom tests
students who have learned more will obtain higher scores and students who have learned less will obtain lower scores. to be an effective test, a test must consist of effective items
types of classroom achievement tests
constructed and selected
Bloom’s taxonomy: levels of understanding
(1) knowledge
(2) comprehension
(3) application
(4) analysis
(5) synthesis/create (switched with 6?)
(6) evaluation
item difficulty index
right v wrong questions: percentage or proportion of test takers who correctly answer the item
item difficulty index: too hard
.0 -.2
item difficulty index: too easy
.9 - 1
item difficulty indeces are:
sample dependent and after the fact
on constructed response tests with two options, what is the optimal mean p value?
.50 (about half the class gets it right)
item discrimination: right and wrong Qs
Pt - Pb