Validity Flashcards
Validity
Popular Definition: Does the test measure what it says it does?
Memorize and apply the definition from the Standards for Educational Psychological Testing (2014)
It is an integrative evaluative judgment of the degree to which emperical evidence and theoretical rationales support the interpretiations and meaningfulness of the test scores….expand. ‘
It measure the truthfullness of the interpretations that we make. We have to judge if we have enough evidence that we can make statements about things.
Unified view of validity (Messick, 1989): construct validity is the whole of validity and all other types of validity are sources of construct validity.
Refer to Zumbo Article on validity. 2011
Validity Sources
Test Score meaning and Inferences:
The whole point of doing the test is to tell the test taker what it means.
You are making an interpretation of the test scores.
When you write a report you are going to interpret the scores and make inferences about the scores/test.
You put the meaning into the score. You do this through integrative and careful judgement of what the score means.
Validity Sources
You take everything into consideration….to unpack what the score means. To determine if the score is valid. Look at the reliability.
If it is not precise enough, forget about validity.
Reliability is a necessary but not sufficient condition.
If it is not reliable then it cannot be valid.
Content evidence (the content of the test) do they survey what they are supposed to. The content of the test items need to test the construct of the test. The content of the test needs to reflect the construct.
Validity Sources
Content evidence (the content of the test) do they survey what they are supposed to. The content of the test items need to test the construct of the test. The content of the test needs to reflect the construct. Score structure evidence: what is in the scoring of the test. How do you score it. Many things can go wrong there. The scoring itself….adding things up. Factor Analysis procedures. How are the test takers responding to the items? Known Group Evidence: a depressed group and a non-depressed group. The scores of the depressed group will look depressed on the test and then their scores will be higher. If they aren’t depressed then it won’t show depression.
Spearman-Brown Prophecy Formula
You have a very expensive test that you have designed but the reliability level is really low. Should you throw out the test? Or is there just not enough questions to make the test reliable? This formula helps you estimate how many more questions you would need to increase reliability to a point where the test becomes useful.
Rxx = krxx/1 + (k - 1)rxx
Rxx = reliability estimate
rxx = odd-even correlation of scores
K = ratio to increase to desired length of a test
Construct Validity
Test items should represent the construct of interest appropriately
Construct underrepresentation occurs when the measure fails to include important dimensions of the construct
Construct-irrelevant variance means that variance due to other distinct constructs, variance due to the method used, and unreliable or error variance are also present
Validity Definition
Validity is ‘‘an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores or other modes of assessment’’ (Messick, 1989, p. 13).
Validity Sources
Validity Framework
Validity is about the meaning of the test score not the test itself. And the inferences and interpretations from the test scores and applying it in a certain concept.
How we use and interpret the scores.
To figure out the meaning and the interpretations of the test scores.
We take into consideration a lot of aspects, the reliability of the test. If it doesn`t measure with good precision and with consistency then the meaning of the test score is meaningless.
Does the items on the test relate to the construct.
The test should discriminate two tests.
Criterion related Evidence: It is predictive of the test scores or an outcome. Eating Habits Test…how many hamburgers do you eat in a day? The test score is ____. Then you want to see if it correlates with their weight or blood pressure. That would be the real life criterion.
2 types:
1. Predictive – criterion in the future. Test now but bld pressure test in 5 years from now.
Concurrent – at the same time. Will you have bld pressure right now.
Convergent and Discriminate Evidence: It measure similar things and correlates certain things.
Convergent: you have your test and the test score and another test with a similar construct. Ie) anxiety correlates with depression. You have the depression test. Then you have the anxiety test score. Then your convergent validity would be if the scores of your measure is correlated with your anxiety scores. You have to have a previously made hypothesis.
Convergent: is a correlation between two measures. Compare the scores on 2 tests) MMBTI and Extroversion.
Diagnosis based on a set of behaviours
Is the Diagnosis a criterion evidence or Convergent?
The criterion is the behaviour of being a psychopath.
DSM is not a test.
You walk down the road of measurement and the fork in the road. A traveler along another path and you begin to walk on the same road so you are converged.
Ie) to show psychopathology, violent beh of violence and lack of empathy. You would expect that the lack of empathy would converge with psychopathology. The discriminate takes the other path.
If they converge you would expect a large correlation +1. The max is .99.
For the discrimate if at the cross word they go different ways then you would expect 0 (zero). No correlation. Flat line. You look at the magnitude. (for convergent and discrimate)
Unintendent and Intended social personal consequences and personal side effects:
What is consequential Validity? How the measure impacts the person or society taking the test. The impact of the test score and how we use them. How you use the tests. Every test score has a consequence. Consequences for treatment and for what it means for the person or the community or the public. The client should also know what the test results might mean. This could have drastic consequences for someone. The person taking the test should know what for.
Generalizability: can the test be generalized? This is external validty. Does it apply to other similar contexts? Ie) a test would apply to other similar adults but not to children. Invariance means something that stays the same and doesn’t change. This becomes an issue if you apply a test to different cultures. Does the construct mean the same thing in the two cultures?
Validity Sources
Construct Test Measure Development Test Validation and Explanation Score use and Reporting Consequences and Side Effects
Validity Vs Reliability
Reliability is about the test
Valididity is how you apply it.
Reliability is Reliability and Precision.
Validity is Reliability and Precision and Accuracy.
X is depression. Y is anxiety. Validity Coefficient between two measures. Validity is Rxx (measure over time). Correlation between the measure itself.
If Reliability is low then it is not good for Validity. The measures need to have high reliability before you talk about validity. Reliability is a condition of valididy. If there is no precision (bull eye) we cant talk about what it means.
Reliability Impacts Validity
Reliability influences validity significantly and sets limits/bounds for the coefficient of validity
Lower reliabilities for x and y may lead to lower validity coefficients that reflect poor reliabilities not necessarily lack of correlation between x and y
Thus, there is a formula to control for lower reliabilities: dis-attenuation formula
Reliability Impacts Validity
If you have low reliability, you need to revise the test.
However if the reliability is low, then for example moods change over time. Not ideal but still ok the you can use a formula to correct for the low reliabiltiies to give you a better estimate of the real reliablity coefficient.
This formulat is called Correction for Attenuation.
Correction for Attenuation
Or Dis-Attenuation formula
= a statistical procedure, due to Spearman (1904), to rid a correlation coefficient from the weakening effect of measurement error
Attenuation means:
reducing. Attenuating the sound, means to decrease the sound.
Why attenuation? These decrease the validity correlation. This becomes an under-estimate of the real valididy. We correct for this.
See formula.
Know that it is possible to correct low reliabilities.
Main Sources of Construct Validity
Content
Criterion: Predictive & Concurrent
External Relationships: Convergent & Discriminant Validity; Multi-trait multi-method matrix
Response Processes (substantive validity)
Score Structure (factor analysis)
Consequences of Testing (consequential validity)
Generalizability (external validity)/Invariance
Content Validity
The content of the test (items) reflects the intended construct
Ex: BDI wants to study depression
1) we have to define depression (in this case DSM)
2) we need qualified judges (psychologists and researchers) to evaluate whether the items match the test domain (e.g., there are different kinds of depression do we want to catch all of them or just one or two)
Ex: for a competency test we want to make sure that the item content covers the domain to be evaluated
Face Validity
Somewhat related to content validity but not a main, reliable source of validity evidence
Do the questions appear to measure what they say they measure? BDI example:
Do you feel sad every day?
Do you feel blue?
Do things seem hopeless?
Have you lost interest in things you used to like?
Sometimes face validity can be less reliable than more subtle questions
Criterion Validity
Concurrent: correlation with criterion at the same time with collecting test scores (scores on BDI correlate with self-reported and observed symptoms of depression at the time of diagnosis)
Predictive: correlation with criterion after a period of time (criterion is in the future; GRE predicts future grad school GPA)
Predictive Validity
Can the test actually predict what it says it can
A test has predictive validity only to the extent that it can anticipate what is going to happen with a minimum amount of error
One thing might be similar to another but it may not correlate
Criterion Validity as Validity Coefficient
Calculating Criterion Validity
- Compute the correlation coefficient between a predictor X (e.g. test of decision making skill) with a criterion Y (job performance)
- rxy is the validity coefficient and rxy 2 (validity coefficient squared)is the proportion of variability in Y predictable from X (also known as coefficient of determination or R squared)
Predictive Validity and Regression
Prediction is if you get a score on one variable then you can predict what you will get on another. You get a score for X then you can predict what you will get on Y.
X is the score on the test. You can get the depression score from the trauma score. It is a linear equation. You can get an actual score of depression if you know the score on trauma.
You get this by using regression. Which is the statistical tool for prediction.
We can predict each score of a valuable from another valuable.
It is important to be able to predict so that we know
Violence and Suicide cannot be predicted because there are too many variables. These behaviours are an isolated event. They are not frequent enough to put into a mathematical equation. There is not enough variance to put in the equation to calculate.
Many variables come into play when you predict.
Standard Error of Measurement
Standard Error of Measurement. Points are outside of the regression line.
The distance between the regression line is the Standard Deviation. The distances from the mean (the line) and is also the error because it doesn’t fall on the line.
If you square them and add them and average them that is the SD
There is the SD of the points outside the line = Residuals.
All the points are the residuals and don’t fit on the line. The SD of all of these individuals are the standard error of estimate.
Standard Error of Measurement
All of the points that are scattered along the line. The one’s that are more distant are errors and they hang out together in the residual group.
The standard error of the estimate gives you an idea of how far away they are from the regression line.
Standard Error of Estimate (SEE)
SEE= the standard deviation of residuals distribution or predictive errors in linear regression
The standard error of estimate (SXY or SEE)= similar concept to standard error of measurement for predictive validity