Validity Flashcards

1
Q

Validity

A

Popular Definition: Does the test measure what it says it does?
Memorize and apply the definition from the Standards for Educational Psychological Testing (2014)
It is an integrative evaluative judgment of the degree to which emperical evidence and theoretical rationales support the interpretiations and meaningfulness of the test scores….expand. ‘
It measure the truthfullness of the interpretations that we make. We have to judge if we have enough evidence that we can make statements about things.
Unified view of validity (Messick, 1989): construct validity is the whole of validity and all other types of validity are sources of construct validity.
Refer to Zumbo Article on validity. 2011

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Validity Sources

A

Test Score meaning and Inferences:
The whole point of doing the test is to tell the test taker what it means.
You are making an interpretation of the test scores.
When you write a report you are going to interpret the scores and make inferences about the scores/test.
You put the meaning into the score. You do this through integrative and careful judgement of what the score means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Validity Sources

A

You take everything into consideration….to unpack what the score means. To determine if the score is valid. Look at the reliability.
If it is not precise enough, forget about validity.
Reliability is a necessary but not sufficient condition.
If it is not reliable then it cannot be valid.
Content evidence (the content of the test) do they survey what they are supposed to. The content of the test items need to test the construct of the test. The content of the test needs to reflect the construct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Validity Sources

A
Content evidence (the content of the test) do they survey what they are supposed to.  The content of the test items need to test the construct of the test.  The content of the test needs to reflect the construct.
Score structure evidence:  what is in the scoring of the test.  How do you score it.  Many things can go wrong there.  The scoring itself….adding things up.  Factor Analysis procedures.  How are the test takers responding to the items? 
Known Group Evidence:  a depressed group and a non-depressed group.  The scores of the depressed group will look depressed on the test and then their scores will be higher.  If they aren’t depressed then it won’t show depression.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Spearman-Brown Prophecy Formula

A

You have a very expensive test that you have designed but the reliability level is really low. Should you throw out the test? Or is there just not enough questions to make the test reliable? This formula helps you estimate how many more questions you would need to increase reliability to a point where the test becomes useful.
Rxx = krxx/1 + (k - 1)rxx
Rxx = reliability estimate
rxx = odd-even correlation of scores
K = ratio to increase to desired length of a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Construct Validity

A

Test items should represent the construct of interest appropriately

Construct underrepresentation occurs when the measure fails to include important dimensions of the construct

Construct-irrelevant variance means that variance due to other distinct constructs, variance due to the method used, and unreliable or error variance are also present

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Validity Definition

A

Validity is ‘‘an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores or other modes of assessment’’ (Messick, 1989, p. 13).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Validity Sources

A

Validity Framework

Validity is about the meaning of the test score not the test itself. And the inferences and interpretations from the test scores and applying it in a certain concept.
How we use and interpret the scores.
To figure out the meaning and the interpretations of the test scores.
We take into consideration a lot of aspects, the reliability of the test. If it doesn`t measure with good precision and with consistency then the meaning of the test score is meaningless.
Does the items on the test relate to the construct.
The test should discriminate two tests.
Criterion related Evidence: It is predictive of the test scores or an outcome. Eating Habits Test…how many hamburgers do you eat in a day? The test score is ____. Then you want to see if it correlates with their weight or blood pressure. That would be the real life criterion.
2 types:
1. Predictive – criterion in the future. Test now but bld pressure test in 5 years from now.
Concurrent – at the same time. Will you have bld pressure right now.

Convergent and Discriminate Evidence: It measure similar things and correlates certain things.
Convergent: you have your test and the test score and another test with a similar construct. Ie) anxiety correlates with depression. You have the depression test. Then you have the anxiety test score. Then your convergent validity would be if the scores of your measure is correlated with your anxiety scores. You have to have a previously made hypothesis.
Convergent: is a correlation between two measures. Compare the scores on 2 tests) MMBTI and Extroversion.

Diagnosis based on a set of behaviours
Is the Diagnosis a criterion evidence or Convergent?
The criterion is the behaviour of being a psychopath.
DSM is not a test.

You walk down the road of measurement and the fork in the road. A traveler along another path and you begin to walk on the same road so you are converged.
Ie) to show psychopathology, violent beh of violence and lack of empathy. You would expect that the lack of empathy would converge with psychopathology. The discriminate takes the other path.

If they converge you would expect a large correlation +1. The max is .99.
For the discrimate if at the cross word they go different ways then you would expect 0 (zero). No correlation. Flat line. You look at the magnitude. (for convergent and discrimate)
Unintendent and Intended social personal consequences and personal side effects:

What is consequential Validity? How the measure impacts the person or society taking the test. The impact of the test score and how we use them. How you use the tests. Every test score has a consequence. Consequences for treatment and for what it means for the person or the community or the public. The client should also know what the test results might mean. This could have drastic consequences for someone. The person taking the test should know what for.

Generalizability: can the test be generalized? This is external validty. Does it apply to other similar contexts? Ie) a test would apply to other similar adults but not to children. Invariance means something that stays the same and doesn’t change. This becomes an issue if you apply a test to different cultures. Does the construct mean the same thing in the two cultures?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Validity Sources

A
Construct
Test Measure Development
Test Validation and Explanation
Score use and Reporting
Consequences and Side Effects
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Validity Vs Reliability

A

Reliability is about the test
Valididity is how you apply it.

Reliability is Reliability and Precision.
Validity is Reliability and Precision and Accuracy.

X is depression. Y is anxiety. Validity Coefficient between two measures. Validity is Rxx (measure over time). Correlation between the measure itself.

If Reliability is low then it is not good for Validity. The measures need to have high reliability before you talk about validity. Reliability is a condition of valididy. If there is no precision (bull eye) we cant talk about what it means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Reliability Impacts Validity

A

Reliability influences validity significantly and sets limits/bounds for the coefficient of validity
Lower reliabilities for x and y may lead to lower validity coefficients that reflect poor reliabilities not necessarily lack of correlation between x and y
Thus, there is a formula to control for lower reliabilities: dis-attenuation formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Reliability Impacts Validity

A

If you have low reliability, you need to revise the test.

However if the reliability is low, then for example moods change over time. Not ideal but still ok the you can use a formula to correct for the low reliabiltiies to give you a better estimate of the real reliablity coefficient.

This formulat is called Correction for Attenuation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Correction for Attenuation

A

Or Dis-Attenuation formula
= a statistical procedure, due to Spearman (1904), to rid a correlation coefficient from the weakening effect of measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Attenuation means:

A

reducing. Attenuating the sound, means to decrease the sound.

Why attenuation? These decrease the validity correlation. This becomes an under-estimate of the real valididy. We correct for this.

See formula.

Know that it is possible to correct low reliabilities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Main Sources of Construct Validity

A

Content
Criterion: Predictive & Concurrent
External Relationships: Convergent & Discriminant Validity; Multi-trait multi-method matrix
Response Processes (substantive validity)
Score Structure (factor analysis)
Consequences of Testing (consequential validity)
Generalizability (external validity)/Invariance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Content Validity

A

The content of the test (items) reflects the intended construct
Ex: BDI wants to study depression
1) we have to define depression (in this case DSM)
2) we need qualified judges (psychologists and researchers) to evaluate whether the items match the test domain (e.g., there are different kinds of depression do we want to catch all of them or just one or two)
Ex: for a competency test we want to make sure that the item content covers the domain to be evaluated

17
Q

Face Validity

A

Somewhat related to content validity but not a main, reliable source of validity evidence
Do the questions appear to measure what they say they measure? BDI example:
Do you feel sad every day?
Do you feel blue?
Do things seem hopeless?
Have you lost interest in things you used to like?
Sometimes face validity can be less reliable than more subtle questions

18
Q

Criterion Validity

A

Concurrent: correlation with criterion at the same time with collecting test scores (scores on BDI correlate with self-reported and observed symptoms of depression at the time of diagnosis)

Predictive: correlation with criterion after a period of time (criterion is in the future; GRE predicts future grad school GPA)

19
Q

Predictive Validity

A

Can the test actually predict what it says it can
A test has predictive validity only to the extent that it can anticipate what is going to happen with a minimum amount of error

One thing might be similar to another but it may not correlate

20
Q

Criterion Validity as Validity Coefficient

A

Calculating Criterion Validity

  • Compute the correlation coefficient between a predictor X (e.g. test of decision making skill) with a criterion Y (job performance)
  • rxy is the validity coefficient and rxy 2 (validity coefficient squared)is the proportion of variability in Y predictable from X (also known as coefficient of determination or R squared)
21
Q

Predictive Validity and Regression

A

Prediction is if you get a score on one variable then you can predict what you will get on another. You get a score for X then you can predict what you will get on Y.

X is the score on the test. You can get the depression score from the trauma score. It is a linear equation. You can get an actual score of depression if you know the score on trauma.

You get this by using regression. Which is the statistical tool for prediction.

We can predict each score of a valuable from another valuable.

It is important to be able to predict so that we know

Violence and Suicide cannot be predicted because there are too many variables. These behaviours are an isolated event. They are not frequent enough to put into a mathematical equation. There is not enough variance to put in the equation to calculate.

Many variables come into play when you predict.

22
Q

Standard Error of Measurement

A

Standard Error of Measurement. Points are outside of the regression line.

The distance between the regression line is the Standard Deviation. The distances from the mean (the line) and is also the error because it doesn’t fall on the line.

If you square them and add them and average them that is the SD

There is the SD of the points outside the line = Residuals.

All the points are the residuals and don’t fit on the line. The SD of all of these individuals are the standard error of estimate.

23
Q

Standard Error of Measurement

A

All of the points that are scattered along the line. The one’s that are more distant are errors and they hang out together in the residual group.

The standard error of the estimate gives you an idea of how far away they are from the regression line.

24
Q

Standard Error of Estimate (SEE)

A

SEE= the standard deviation of residuals distribution or predictive errors in linear regression

The standard error of estimate (SXY or SEE)= similar concept to standard error of measurement for predictive validity

25
Decision Theory and Criterion Validity
There is always a probability of a wrong decision/prediction | Decision theory developed in order to deal with these errors
26
Sensitivity and Specificity
``` Sensitivity = true positives / (false negatives + true positives) and identifies the percentage of depressed people in the sample (based on the criterion) that the depression scale correctly identified as depressed Specificity = true negatives / (true negatives + false positives) and indicates the percentage of non-depressed people in the sample (according to the criterion) that the depression scale correctly identified as non-depressed ```
27
Predictive Values
The positive predictive value = true positives / (false positives + true positives) and shows the percentage of individuals who are truly depressed(according to the criterion) out of those whom the scale identified as depressed The negative predictive value = true negatives / (true negatives + false negatives) and indicates the percentage of people who are truly not depressed (according to the criterion) out of those whom the scale identified as non-depressed
28
Convergent and Discriminant Validity
Convergent measures may consist of measures of highly related constructs (e.g., depression and anxiety) or the same constructs (e.g., depression); in the latter case, correlations of such scores are sometimes misidentified as criterion-related validity evidence Discriminant measures may consist of theoretically unrelated constructs (e.g., depression and intelligence) or constructs between which one wants to distinguish (e.g., depression from anxiety). Multi-trait multi-method matrix (MTMM), Campbell and Fiske (1959)
29
Table
You create this to cmpare the correlations between collecting the data and different constructs. It puts together a variety of information. Everything comes together and allows you to see the correlaction coefficients a mong different methods and different constructs. You do this conceptually. Establish the construct. You need one that is convergent and one that is discriminate. Well being is convergent but negatively associated. Intelligence is Discriminant. You want to make sure that your scores are not measuring intelligence. What are 3 methods you can use: Self Report, Clinician Administered Test, and observer report (parent of the child or spouse) Then you compute the correlations between them all. You want a high positive correlation between the one’s that are convergent. We want high correlation with anxiety/depression and low with depression/Intelligence No matter what you use you get the same construct. Understand Method Effect.
30
Three Rules for Defining Convergent and Divergent Validity
1. Convergent validity coefficients (rxy) should be a lot greater than zero but not greater than the square root of the reliability coefficient 2. Discriminant validity is evident when convergent validity coefficients are substantially greater than the coefficients for different traits under the same method 3. Discriminant validity is also suggested when the validity coefficients are higher than the coefficients for different traits under different methods
31
Consequential Validity
Hubley & Zumbo 2011 article | Messick’s Validity Matrix
32
Consequential Validity
Hubley & Zumbo, 2011: “Value implications and social consequences are inherent to score meaning and are not part of a new or separate ‘consequential validity’” The ‘theory/theories’ include the theory related to the construct, theories related to the sample and context, and psychometric theory and models The effect of values is pervasive throughout the framework and related to theory, the construct, test/measure, and construct validity as well as validation choices and decisions.