Scales of Measurement & Desig Flashcards
Validity of a Test
Is the extent to which it measures what it claims to measure
Can a test be Reliable and Valid?
To the extent that a test is unreliable it cannot be valid
e.g. weighing on a scale that consistently adds 10 lbs…. it is reliable but not valid
A test is valid
To the extent that inferences made from it are appropriate, meaningful and useful.
Three categories of validity evidence
Content Validity
Criterion-Referenced Validity
Construct Validity
Content Validity
Is determined by the degree to which the questions, tasks, or items on a test are representative of the universe of behavior the test was designed to sample
If the sample (specific items on the steps) is representative of the population (all possible items), then the test possesses content validity
Face Validity
Is not really a form of validity at all
A test has Face Validity if it looks valid to test users, examiners and especially, the examiner.
Should not be confused with Objective Validity
Objective Validity
Is determined by the relationship of test scores to other sources of information
Criterion-Related Validity
Is demonstrated when a test is shown to be effective in estimating an examinee’s performance on some outcome measure. The variable of primary interest is the outcome measure, called a CRITERION.
Two different approaches to Validity evidence
Concurrent Validity
Predictive Validity
Concurrent Validity
Criterion measures are obtained at approximately the same time as the test scores
Predictive Validity
The criterion measures are obtained in the future (usually months/years after the test scores are obtained.
E.g., college entrance exams
Characteristics of a good Criterion
- Must itself be reliable if it is to be a useful index of what the test measures
- A criterion measure must also be appropriate for the test under investigation
- All criterion measures should be described accurately, and the rationale for choosing them as relevant criteria should be made explicit
- A criterion must also be free of contamination from the test itself
Criterion
Is any outcome measure against which a test is validated.
It can be most anything
Validity Coefficient
See pg 76
The validity coefficient is always less than or equal to the square root of the test reliability multiplied by the criterion reliability.
In other words, to the extent that the reliability of either the test of the criterion (or bothz0 is low, the validity coefficient is also diminished.
Criterion Contamination
Potential source of error in test validation
The criterion is “contaminated” by its artificial commonality with the test
Also possible when the criterion consists of ratings from experts.
Concurrent Validity
- Test scores and criterion information are obtained simultaneously.
- Usually desirable for achievement tests, tests used for licensing or certification, and diagnostic clinical tests.
- Indicated the extent to which test scores accurately estimate an individual’s present position on the relevant criterion.
Predictive Validity
- Test scores are used to estimate outcome measures obtained at a later date.
- Relevant for entrance exams and employment tests—common endeavor of thies tests:determining who is likely to succeed at a future endeavor.
Regression Equation
Describes the best-fitting straight line for estimating the criterion from the test
Pg. 77
Validity Coefficient
Pg 78
The higher the validity coefficient, the more accurate the test is in predicting the criterion
Standard Error of Estimate
Pg. 78
The margin of error to be expected in the predicted criterion score
Standard Error of Measurement (SEM)
This, like the SEE index help gauge margins of error.
It indicates the margin of measurement error caused by unreliability of the test whereas SEE indicated the margin of prediction error caused by imperfect validity of the test
Decision Theory
Pg 78
Proponent of this theory stress that the purpose of psychological testing is not measurement per se, but measurement in the service of Decision making
False positives
Some people predicted to succeed will in fact fail
False Negatives
Collectively known as “misses”
Some predicted to fail would, if given the chance, succeed
Hit Rate
Hit Rate=
- The test has made an inaccurate prediction
- Is the proportion of cases in which the test accurately predicts Sussex’s or failure
=(hits)/(hits+misses)
Decision Theory proponents make two fundamental assumptions about the use of selection tests:
- The values of various outcomes to the institution can be expressed in terms of a common utility scale (e.g., profit-loss)
- In institutional selection decisions, the most generally useful strategy is one that maximizes the average gain on the utility scale (or minimizes average loss) over many similar divisions
Construct
- A theoretical, intangible quality or train in which individuals differ
- are theorized to have some form of independent existence and to exert broad but to some extent predictable influences on human behavior
Construct Validity
Pertains to psychological tests that claim to measure complex, multifaceted, and theory-bound psychological attributes such as psychopathy, intel, leadership ability and the like
“No criterion or universe of content is accepted as entirely adequate to define the quality to be measured”
To evaluate this, we must amass a variety of evidence from numerous sources
Refers to the appropriateness of these inferences about the underlying construct
A test designed to measure this must estimate the existence of an inferred underlying characteristi (e.g., leadership ability) based on a limited sample of behavior.
All Psychological constructs possess two characteristics in common
- There is no single external referent sufficient to validate the existence of the construct; that is, the construct cannot be operationally defined
E. A network of interlocking suppositions can be derived from existing theory about the construct
Most studies of Construct Validity fall into one of the following categories
- Analysis to determine whether the test items or sub tests are homogeneous and therefore measure a single construct
- Study of developmental changes to determine whether they are consistent with the teeory of the construct
- Research to ascertain whether group differences on test scores are theory-consistent
- Analysis to determine whether intervention effects on test scores are theory-consistent
- Correlation of the test with other related and unrelated tests and measures
- Factor analysis of test scores in relation to other sources of information
- Analysis to determine whether test scores allow for the correct classification of examinees
Test Homogeniety
If a test measures a single construct, then its component items (or sub tests) likely will be homogeneous (also referred to as internally consistent).
Homogeneous Scale
Method to achieve this is to correlate each potential item with the total score and select items that show high correlations with the total score.
A related procedure is to correlate sub tests with the total score in the early phases of test development.
Homogeneity
Is an important first step in certifying the construct validity of a new test, but standing alone it is weak evidence
Social Interest Scale
Alfred Adler, Crandall (1984) defined: as an “interest in and concerns for others”
Convergent Validity
Demonstrated when a test correlates highly with other variables or tests with which it shares an overlap of constructs
Discriminant Validity
Demonstrated when a test does not correlate with variables or tests from which it should differ
Multi trait-Multi method Matrix
Campbell and Fiske (1959)
This calls for the assessment of two or more traits by two or mor methods
Table 4.2
Can be a rich source of data on reliability, convergent validity and discriminant validity
Full implementation of this procedure typically requires too monumental a commitment from researcher
Factor Analysis
The purpose is to identify the minimum number of determiners (factors) required to account for the intercorrelations among a battery of tests.
The goal is to find a smaller set of dimensions, called factors, that can account for the observed array of inter correlations among individual tests.
A typical approach is to administer a battery of tests to several hundred subjects and then calculate a correlation matrix from the scores on all possible pairs of tests.
Factor Loading
-Is actually a correlation between an individual test and a single factor.
-Can vary between -1.0 and +1.0
-
The final outcome of a Factor Analysis…
Is a table depicting the correlation of each test with each factor.
Two Psychometric features
Sensitivity
Specificity
Sensitivity
Has to do with accurate identification of patients who have a syndrome (e.g., demential)
Specificity
Has to do with accurate identification of normal patients