Exam 2 Flashcards
cross-validation
process of administering a test to another sample of test takers, representative of the target population; can also simply gather a large enough data set and randomly split into 2 samples; influenced by sample and used to evaluate regression
calibration sample (aka Training Set)
sample for which regression parameters are set
validation sample (aka Test Set)
sample used to predict criterion scores
differential validity
when a test yields significantly different validity coefficients for subgroups
single-group valididty
valid for one group, but not for another
measurement bias
scores on a test are taken by different subgroups in the population (ex. men & women) need to be interpreted differently because of some characteristic of the test that is not related to the construct being measured
differential prediction
an outcome in which there is a significant difference between regression equations for 2 groups as indicated by differences in slopes, intercepts, or both.
criterion-related validity
the extent to which scores on a test correlate with scores on a measure of PERFORMANCE or behavior; extent to which tests scores correlate with or predict independent behaviors attitudes, or events
2 Methods for evidence of Criterion-Related Validity
- predictive 2. concurrent
Predictive Method
used to show a relationship between test scores and a future behavior
validity coeff.= a statistic used to infer the strength of the evidence of validity that the test scores might demonstrate in predicting job performance
restriction of range= asses job applicants on the predictor
Concurrent Method
test administration and criterion measurement happen at the same time. does NOT involve prediction; provides information about the present & status quo.
reliability/precision vs. validity
reliability/precision: the CONSISTENCY of test results that derives from 2 factors (internal consistency and test-retest reliability)
validity: depends on the INFERENCES that are going to be made from scores
objective criterion
observable and measurable; verifiable with facts and no doubt
subjective criterion
based on a person’s judgement; peer-ratings; well-defined objective criteria leads to less error, narrow scope
criterion contamination
when criterion measures MORE DIMENSIONS than those measured by the test; when unreliable/inappropriate criteria are used for validation, the true validity coefficient might be under or overestimated
Tests of Significance
“how likely is it that the correlation between the test & the criterion resulted from chance or from sampling error?”
coefficient of determination
determined the amount of variance that the test and criterion share; shared variance by sharing the validity coefficient to obtain r sq.
Linear Regression
one set of test scores (x) to predict one set of criterion scores; in linear regression, we refer to this line as the regression. we calculate the slope or b weight of the regression line- the expected change in Y for every one right unit change in X
Range Restriction
the reduction in range of scores that results when some people are dropped from a validity study such as when low performers are not hired, causing the validity coefficients to be lower than it would be if all persons were included in the study; correlations for range restriction are available
construct validity
evidence that a test relates to other tests and behaviors; a construct=behaviors, actions, that are observable and measurable
Nomological network
method for defining a construct by illustrating its relation to as many other constructs and behaviors as possible
states vs. traits
states are a TEMPORARY condition, perhaps brought on by situational circumstances
traits are LONG-LASTING individual quality that has become an enduring part of a person
Jingle-Jangle Fallacy- JINGLE
Jingle is 2 measures labeled w/ same construct, but uncorrelated
Jingle-Jangle Fallacy- JANGLE
Jangle is 2 measures w/ DIFFERENT construct but CORRELATED
Convergent Validity
test scores correlated with measures of the same or similar constructs
Discriminant Validity
test scores are NOT related to unrelated constructs
Heterotrait-hetermethod Correlations
(evidence of discriminant) different things measured differently should not be correlated
Heterotrait-Monomethod Correlations
(evidence of common method variance) different things measured the same way should NOT be correlated, but they typically are to some degree
Monotrait-heteromethod correlations
(evidence of convergent validity) the same thing measured in different ways should be correlated
Monotrait-Monomethod
(evidence of reliability) same thing measured in the same way correlated with itself should be HIGHLY correlated
Exploratory Factor Analysis (EFA)
no formal hypothesis about factors, “how many underlying factors are there?”
Confirmatory Factor Analysis (CFA)
factor structure specified in advance based on theory; “how well does my data replicate the theoretical structure?”
validity
refers to whether there is evidence supporting the interpretation of the resulting test scores for their proposed use; a test must be reliable before it is valid
test-retest method
test developer gives the same test to the same group on 2 different occasions; scored for a correlation; examine the stability of test scores over time and provides an estimate of tests reliability/precision limitation is practice effects
alternate-forms method
2 forms of the same test, as much alike as possible, to the same people; 2 forms are the alternate/parallel forms; administered close in time typically the same day to guard order effects
internal consistency method
how related the items on 1 test are to one another; measuring the same attribute
split-half method
divide test into 2 halves and then compare the set of individual scores on the first half with the second; must be equal in length and content; use random assignment
homogeneous tests
measure 1 trait/characteristic
scorer reliability/ interscorer agreement
the amount of consistency among scorers’ judgemetns
intrascorer reliability
whether each clinician was consistent in the way he or she assigned scores from test to test
Classical Test Theory
X= T + E
True Score (T)
can not be truly known or determined, represents score a person would obtain if they took a test an infinite number of times and then average the score and that average cancels out random error
Random Error (E)
difference between actual score (obtained) and true score
Systematic Error
obscures the true score, when a single source of error always increases or decreases the score by the same amount
Spearman-Brown Formula
used for split-half tests
KR-20
used for internal consistency tests that are based on true or false or multiple choice scored on right vs wrong
Coefficient Alpha
used for tests that have a number of answers to choose from
Pilot Testing
when developing a test we cannot simply assume a test will be as reliable as expected; pilot tests should be representative of the intended use; participants should be representative
quantitative item analysis (2)
2 main components:
item discrimination & item difficulty
item difficulty
percentage of people who answer the item correctly (p) defines the probability of occurring by chance
item discrimination
how well item separates high and low performers ; upper and lower groups;
negative numbers mean people w. low ability
low positive numbers mean poor discrimination
high positive numbers mean good discrimination