Psychometric Properties Flashcards
sensitivity
-the proportion of people who have a disease who have a positive result; how the test accurately assesses correctly those who do have a diagnosis
-decrease the threshold/cutoff to increase this
-e.g., assessments for suicidal ideation need to be very ____
specificity
-the proportion of people who have a disease who have a positive test result; how the test accurately assesses correctly those who do NOT have a diagnosis
-increase the threshold/cutoff to increase this
-e.g., ASD dx needs to be more ____
true positive
have the disease and have a positive test; hit rate
false positive
do NOT have the disease, but have a positive test; false alarm
false negative
have the disease, but have a negative test (the test does not pick it up); fall thru the cracks
true negative
do not have the disease and have a negative test; correct rejection rate
criterion referenced
test developer/publisher sets the cutoffs/thresholds and interpretation guidelines; an outside criterion influences the score thresholds
norm-referenced
comparison scores in which individual scores are compared to a population norm - the group that originally took the test to determine the scores that an individual receives
raw scores
no inherent meaning, summer of item responses (total points) when scored; needs interpretation guidelines from the test developer
standard scores
raw scores that have been converted to an interpretable scale that are based on normal distribution and the norm group
-e.g., z-scores, T scores, etc.
percentages
raw score that reflects the number of correct responses obtained out of the total possible number of correct responses on a test (no inherent meaning)
percentiles
scores that reflect the rank or position of an individual’s test performance in comparison to others who took the test
reliability
consistency or stability of the scores/responses across time
inter-rater reliability
-consistency of scores across examiners (across coders)
-must operationalize the constructs measured
-if a test does not have this, the effects observed may be due to the individual who coded
-ideal to have higher ___ ____ (r = .90)
internal reliability
-consistency of the structure (across items); homogeneity of the group of items in a response set or within subscales
-how well do the items measured in the test strongly associate with the construct measured
-e.g., in the BSI –> depression subscale has highest level of ___ ____
Reliability Cutoffs
-very high: >.90
-high: .80-.89
-acceptable: .70-.79
-moderate/acceptable: .60-.69
-low/unacceptable: < .59
validity
the extent to which a test measures what it intends to measure
relationship between validity & reliability
assessment can be reliable and not valid
–> measure can be reliable and not valid (can have high reliability but low validity)
–> unreliable measure CANNOT be valid
–> e.g., BMI has high reliability but low validity
content validity
-representativeness of items
-evaluation component:
1) who came up with the questions?
2) how were the items selected?
3) how representative are the items of the domain?
criterion validity
-tests how well the assessment scores correlate with an outside construct
-tests if external ____ is related to whatever outcomes measured
concurrent validity
a type of criterion validity that assesses how well the instrument correlates with scores of an external criterion or those of a previously established measurement of the same construct
-e.g., does the instrument correlate to DSM diagnosis
predictive validity
a type of criterion validity that assesses if the instrument at whatever time point predicts future instrument scores or outcomes
-e.g., when a pre-employment test accurately predicts an applicant’s future job performance
construct validity
-the extent that the instrument is measuring what it is supposed to measure
-internal consistency is evidence to support ___ ____
convergent validity
a type of construct validity which measures how well the current instrument correlates with other instruments of the same construct
-e.g., Hamilton depression rating scale vs. BDI
discriminant validity
a type of construct validity which measures how well the current instrument differentiates against other instruments of different constructs
-e.g., an assessment of self-esteem should not be correlated with an assessment of intelligence
Validity Cutoffs
very high: > .50
high: .40-.49
moderate/acceptable: .21-.40
low/unacceptable: < .20