Lecture 9: Validity Flashcards

1
Q

Reliability

A

degree to which differences in test scores reflect differences in psychological attribute that affects test scores (does the test’s score reflect something with precision?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Validity

A

“the degree to which evidence and theory support the interpretations of test scores for proposed uses”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

3 Points about validity

A
  1. Validity is about the interpretation of test scores in terms of a specific psychological construct; it is not about the test itself 2. Validity is matter of degree; it is not all or none 3. Validity is based on solid empirical evidence and theory (use information from research, not just choice)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the importance of theory for validity

A

questionnaires and test require a theory that defines what the construct is; otherwise it’s not possible to understand what the test scores mean (if you can’t define it, you don’t know if you’re measuring it)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is it important to measure validity? (what could be detrimental if you don’t)

A

if the psychological meaning of test scores is misinterpreted, then 1. Psychological conclusions based on those scores may be mistaken 2. Decisions made about people (based even partly on those scores) may be misguided and potentially unfair or dangerous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the dimensions of construct validity?

A
  1. Test Content (what are we measuring) 2. Internal Structure (Factor analysis; what are the dimensions? How do they relate) 3. Response processes (how someone thinks about an item, what are they thinking about, how does that relate to what we’re trying to ask) 4. Consequences of use (the impact—what we want or what we don’t) 5. Associations with other variables (how we’ve been thinking about it in the past; empirical evidence)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the three types of validity? (from book reading)

A
  1. Content
  2. Criterion—degree to which test scores can predict specific criterion; de-emphasizes conceptual meaning or interpretation of test scores
    a. Concurrent and predictive validity (criterion is now thought to be under the construct)
  3. Construct (now more focused)
    a. Test content—content of the test match what should be on the test (expert rating)
    i. Construct-irrelevant content and construct underrepresentation
    ii. Face validity is close to content (but looks at the common people)
    b. Internal structure—does the internal structure match what it should be (factor analysis)
    c. Response processes—do the psychological processes that respondents use when completing a measure match what they should (interviewing and eye tracking)
    d. Associations with other variables—do the associations with other variables match what they should be (convergent, discriminant, concurrent, and predictive correlations)
    e. Consequences of use—actual consequences of using a measure match the consequences that should be seen (evaluation of consequences, differential impact and systematic changes [MCAT content reflected how premeds were taught])
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the three types of validity from Cronbach and Meehl?

A
  1. Criterion
    a. Predictive—criterion obtained after test is given
    b. Concurrent—test score and criterion obtained at the same time
  2. Content—showing the test items area sample of a universe in which the investigator is interested; deductive
    a. Acceptance of the universe of the content as defining the variable to be measured is essential
  3. Construct—when a test is interpreted as a measure of some attribute or quality which is not “operationally defined”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is content validity?

A

“the systematic examination of the test content to determine whether it covers a representative sample of the behavior domain to be measured”; the relationship between the content of a test and some well-defined domain of knowledge or behavior (relationship between what it is on the test and what you think should be on the test based on the theory); good match between content and domain=high content validity); does the actual content of a test match the content that should be included on the test?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Steps for content validity

A
  1. Reflect the full scope of the construct (vs construct under-representation) 2. Systematically exclude anything besides the construct (vs construct irreverent variance); evidence via expert ratings
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Construct underrepresentation

A

The degree to which a test fails to capture important aspects of the given construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Irrelevant variance

A

the degree to which test scores are affected by processes extraneous to the intended construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Steps to assess content validity

A
  1. Describe the content domain (in terms of boundaries/limits and structure) 2. Determine the areas of the content domain that are measured by each item (go item by item, what am I measuring?) 3. Compare the structure of the test with the structure of the content domain (do you want two domains even?)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Concerns with content validity

A

short forms of a test (ex. Shortened depression screening tool—it only has two items, maybe missing people or over diagnosing); content overlap (ex. Agitation or tension; if you’re measuring depression, but including things like tense or agitation, it’s more anxiety; overlap between two different questionnaires of two different constructs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Final thoughts on content validity

A

content validation is a process rather than a statistical analysis (really a process where you’re going through each item and domain); content validity differs from internal consistency reliability (in internal consistency we want the items to hang together and mean the same thing; content validity is similar but more related to the content of the construct, not just if the items measure the same thing)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Face validity

A

if a test looks like it measures its target construct (“how sad are you?” For depression); don’t confuse this with the empirical approach (does not mean it actually measures what you’re interested in); concern: malingering (subject to response bias—all questions are, but more subject; low face validity might be better for malingering)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Internal Structure (structural validity)

A

does the actual internal structure of a test match the structure that the test should possess; dimensionality and factor analysis (if a test has different dimensions, how it is measuring these and how do they relate to one another)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Internal structure validity issues evaluated through factor analysis

A
  1. Number of factors 2. Meaning of factors 3. Associations between factors (if more than one factor) 4. Order of factors (do they relate to the construct overall or are they separate? Top: Depression, Second Level: 1. Cognition 2. Somatic, Third level: what relates to those)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Reponses Process Validity

A

do the psychological processes that actually shape peoples’ response to the test match the processes that should shape their response? (understand what the question means to the individual taking the test; people have different processes; interviewing after a questionnaire helps get a general sense of individual’s response)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Two types of evidence for response process validity

A

Evidence: 1. Direct evidence: interviews with respondents, “think alouds” 2. Indirect evidence: eye-tracking, response times, statistical analysis, and experimental studies of process (use if you can’t give interview or if you think they’re lying)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Consequences of Testing Validity

A

do the actual consequences of using a test match the consequences that should be seen?; Controversial as a facet of validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Evidence for consequences of testing validity

A

evidence: 1. Evidence of intended effects 2. Evidence regarding unintended differential impact on groups (unbiased test could have biased results; women might have higher emotional intelligence than men, but if using for a screening for grad school, you’ll choose more women) 3. Evidence regarding unintended systemic effects (choosing more women for your program over time, more woman professors and faculty members, long-term effects on school and even profession)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Associations with other measures validity

A

does the actual associations with other measures match the associations that it should have with those measures? (are the scores obtained a good measure of the construct; extent to which the scores obtained on measure are consistent with the construct and measures related to this construct);

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Evidence for associations with other measures validity

A

); evidence: 1. Convergent Validity (concurrent/predictive validity; expect to find significant correlations between measures of behaviors related to our construct and our test; different measures of construct should be correlated also) 2. Discriminant (divergent) validity (behaviors not associated with our construct of interest should be uncorrelated with our test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How do you evaluate association with other measures correlations?

A

Some correlations .45 could be a low level of convergent OR divergent, to determine it you look at what you expect (nomological network)

26
Q

Association with other measure key questions 1. How do you know which scales to measure or examine? 2. How do you know what pattern of convergent or discriminant associations to expect?

A

Constructs Nomological Network

27
Q

Constructs Nomological Network

A

network of associated constructs, behaviors, and properties

28
Q

Assessing criterion-related validity

A

(a criterion is an outcome that you want to be associated with your measure—what you hope your measure will predict) correlation of test scores with the outcomes of decisions that are made 1. Predictive validation strategies (the “ideal” approach; 1. Obtain test scores 2. Obtain performance measures and correlate these with test scores; gives you a validity coefficeint) 2. Concurrent validation strategies (a practical alternative; test scores and criterion scores from a preselected population are obtained at the same time)

29
Q

Predictive validation strategies

A

the “ideal” approach; 1. Obtain test scores 2. Obtain performance measures and correlate these with test scores; gives you a validity coefficeint

30
Q

Concurrent validation strategies

A

a practical alternative; test scores and criterion scores from a preselected population are obtained at the same time

31
Q

Advantages to concurrent validation

A
  1. Much more practical than predative validation 2. Easier than predictive validation 3. Concurrent validity coefficients are often similar to those of predative validation
32
Q

Statistical and conceptual problems with concurrent validation

A

quality of the predictor (often there are many factors that relate to the criterion); range restriction (reduces correlation between predictor [test score] and criterion [outcome measure]; if you’re looking at GRE score and success in grad school, you would miss those who didn’t get into grad school; can be direct, through selection by the predictor; can be indirect, through decisions based on the criterion); goal is often to screen out failures (allowed for in predictive validation not concurrent validation; predictive is much better than screening tool)

33
Q

Incremental Validity

A

a test much demonstrate that it has better predictive ability than data from an assessment; demonstrate validity beyond existing assessments; concern: who decides the Gold Standard; supposed to choose gold standard, but some don’t just so they can prove this

34
Q

Interpreting validity coefficients

A

correlation coefficients range from 0-1; don’t expect huge validity coefficients (usually not larger than 0.5); squaring the validity coefficient tells about the amount of variance in the criterion that can be explained by the predictor (ex. Using a test of cognitive ability to predict results in a validity coefficient of 0.5. We can say 25% of the variance in job performance is accounted for by cognitive ability)

35
Q

Methods of Construct Validation

A
  1. Correlational study (correlations between measures of certain behaviors [that are either related to or unrelated to our construct of interest] and our test; meta-analysis: gather many validity studies and combining them) 2. Factor analysis (analyzing which groups of items “hang” together) 3. Experimental manipulation (manipulate the construct of interest [i.e., induce fear] and see if it relates to different scores on our test) 4. Multi-trait multi-method matrix (MTMM)
36
Q

Multitrait-Multimethod Matrix (MTMM):

A

based on convergent and discriminant validity; three elements 1. Multitrait—measures different traits 2. Multimethod—uses various methods to measure those traits 3. Matrix—a table comprised of correlations between methods and traits measures

37
Q

Base rate

A

level of performance on criterion in the general population; e.g., if 75% of the population successful, the base rate is 0.75

38
Q

Selection ratio

A

ratio of positions to applicants; e.g., if 30 people apply for 3 jobs, the selection ratio is 10% or 0.10

39
Q

True Positive (TP)

A

when a test predicts success and the person is actually success

40
Q

True Negative (TN)

A

when a test predicts failure and a person actually fails

41
Q

False Positive (FP):

A

When a test predicts success and the person actually fails

42
Q

False Negative (FN):

A

when a test predicts failure and a person actually succeeds

43
Q

Effect of base rate on decisions: when the base rate is large

A

most everyone would be successful, but there are a limited # of positions, so there will be a high number of true positives and a high number of false negatives

44
Q

Effect of base rate on decisions: when base rate is small

A

hardly anyone would be successful, but must fill a set # of positions, so there will be a high number of true negatives and a high number of false positives

45
Q

when are tests used as predictors are most likely to have an impact on accurate decision making

A

when the base rate is moderate (0.5)

46
Q

Effect of selection ratio on decisions

A

: if selection ratio is high (# of positions and # of applicants almost equal), doesn’t really matter what method of prediction you use because you’re taking almost everyone who applies; validity has the biggest impact on correct decision making when selection ratio is low

47
Q

sensitivity

A

the probability of a test to correctly identify individuals who have the disorder; sensitivity= true positives/ (true positives and false negatives; i.e. everyone who actually has the disorder);

48
Q

specificity

A

the probability of a test to correctly identify individuals who do NOT have the disorder; specificity=true negatives/(true negatives + false positives)

49
Q

Internal validity

A

: it depends on the strength or soundness of the design and analysis of the study in which a scientific question is evaluated, including any study that validates a psychometric test or question

50
Q

History

A

(Events, other than the experimental treatments or the true effect among the variables, that have influenced results),

51
Q

Maturation

A

(During the study, some kind of psychological change occurs within people completing the test that affects how test scores are related to scores on other tests and questionnaires)

52
Q

Testing

A

Exposure to some known or unknown pretest or intervening assessment influences performance on your test)

53
Q

Instrumentation

A

(Testing instruments or conditions in which people complete the tests are inconsistent; or the pre- and post-tests are not equivalent in some manner),

54
Q

Statistical Regression

A

Scores of subjects that are very high or very low tend to regress towards the mean during retesting),

55
Q

Selection

A

Systematic differences exist in subjects’ characteristics between treatment or comparison groups which affect the performance of the test)

56
Q

Mortality

A

(Attrition affects the representativeness of the group you are studying and can lead to an over-estimate, under-estimate or an unreliable result),

57
Q

Diffusion of Treatments (

A

People in one group are able to communicate with the other group and affect the manner in which the other group completes the questionnaire),

58
Q

Effects of Pretesting/Order Effects

A

This is when one test in your battery of tests affects how people complete a subsequent test. Completing one test may lead to an improvement, increase in skill

59
Q

Placebo Effects or Expectancy Effects

A

This is when you get an effect because the participant expects something to happen even those there no “active” ingredient or agent)

60
Q

Demand Characteristics (

A

This is when people completing the questionnaires respond in a manner that they believe is expected of them

61
Q

Experimenter Bias

A

The tests developers’ ideas about how the construct should be defined and operationalized is in some manner “biased” and as a result, out of line with how most people would define and measure it, leading to the creation of a test that produces very different results