Ch 5 - Validity Flashcards

1
Q

Define validity

A

is the accuracy

* Reliability is necessary but insufficient for validity
* BECAUSE
* If scores are not precise in the 1st place, they have no meaningful interpretation
* In the same way,
    * Precise scores may not measure what they should

Validity is usually established over many studies - no one single study/method can address all the issues of validity at once

Validity is NOT a property of a test - like reliability (avoid using “test validity”)

evidence that we bring to support any inference that is to be made on the basis of test results (textbook)
3 ideas in this definition:
• Validity is cumulative - validation is the process of gathering evidence for the validity of a test
• As the evidence accumulates, the validity may be increased or diminished
The evidence for validity can be gathered through any type of systematic scientific research by any qualified test user, even if the test author did not foreseen this application at first
validity: a matter of judgments that pertains to test scores as they are employed for a given purpose and in a given context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Kane’s interpretation-use arguments (AKA IUA perspective)

A

• Involves:
1. Score interpretation depends on the
2. Context of use (what is the test needed for)
3. Type of evidence needed (how are we gonna justify the use of this measure)
• 1 and 2 together will have an impact on 3
• If 1 or 2 changes (how the scores should be interpreted, the goal/rationale for using the test changes), then a new type of evidence may be needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

types of evidence for + concrete vs abstract intended interpretation of test scores

A

If the intended interpretation is very limited, (AKA concrete, does not lead to suppositions), then the evidence for that interpretation will be limited (and it’s ok, we don’t need a lot)

If the intended interpretation is more abstract (AKA used to make inferences about the possible future behaviour of a person), then more evidence will be needed

For example, if you use observed behaviour to make predictions of behaviour in a more distant/different future, then you will need more validity evidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

4 major categories of evidence in validity studies

A
  1. Test content (does test content match the target? The goal / intended use?)
    2. Internal structure (more relevant for battery tests) (are correlations from scores over the subtests consistent with theoretical expectations)
    i. Ex: cognitive ability tests (example, with 16 subtests) - we might say that 4 of them measure motricity, another 4 measure vision, etc (these expectations for the subtests should measure the theoretical expectations)
    ii. Theoretical expectations - based on previous theories on the subject AND on previous studies done
    3. Covariance - either with future scores or with other scores on the same test (for situations where we want to establish the predictive validity of test scores)
    4. Response process (much rarer than the other 3) - (ex: when someone wants to know how ppl solve a problem - will give a problem to ppl and ask them to think out loud to see how they think - this corresponds to observing response process)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Content validity evidence

A

concerns whether the items on a test are representative of the target domain (type, relative frequency, proper sampling from the domain, etc)

External standard / goal standard = expert opinion
We want the test to demonstrate the level of mastery that an expert judges adequate for that domain, no less

• Is especially important for mastery tests - ex: determining if someone has an acceptable degree of mastering in a certain area - there should be evidence in the test that it indeed assesses mastery of the domain area
• Good example; it's especially relevant to have adequate evidence of content validity for a driving test - we want it to represent reality, and have some challenge, to know if people are really ready to be drivers
	○ If we shorten the test to save time, is it still representative of the skills needed to drive in real life?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Test specifications

A

Test specifications - before writing the first item, one should plan the test out (come up with a blueprint of the test) in terms of:
• Target population (who is it intended to and not)
• Context (goal of the test)
• Use (how are we gonna use it? What for?)
○ These features of the test will indicate
§ What the input of the test will be (what the content will be)
§ Cognitive operations that should be performed with the basic input (ex: rote memorization)
§ Output of the operations (ex: a written document, a list of steps, etc - corresponds to the form of the answers provided after the operations on the output)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the main goal in the evaluation of content validity

A

The evaluation of content validity is far more rational than statistical - even though there are numbers involved
• Main goal - define and recruit the appropriate content experts in the area to determine the content of the test - however, experts might use numbers to make their suggestions, but the final recommendations are mostly qualitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Internal Structure Evidence

A

Example: we are constructing a test and we want the content to be representative
The experts tell us: the items should be representative of the domain, but that domain is ONE dimension only (unidimensional)
• Where does the alpha coefficient comes into play
• Internal consistency coefficient concerns internal structure
• A requirement of the alpha coefficient (or an assumption) is that the item set it represents is unidimensional
• Therefore, an alpha coefficient would be relevant for our situation
• If it’s assumed that the content domain is unidimensional and the coefficient assumes the same, it’s absolutely relevant to our situation (therefore a high value to the alpha coefficient would be relevant and indicate that everything belongs to one domain, BUT wouldn’t prove that it’s from the RIGHT domain)

This does not prove validity, but it’s a great start, since reliability is NECESSARY for validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can we evaluate the internal structure evidence for a battery test (multidimensional)

A

The test as a whole is multidimensional
In this case, an alpha coefficient can be relevant for each INDIVIDUAL area
• A Pearson correlation between the scales can also be relevant
• Are the correlations among the sets of items (subtests) of a battery consistent with theoretical expectations concerning the number of domains that should be measured, and do the subtests correspond to the domains predicted by the test authors
○ Is the association between the domains and what they are supposed to measure consistent?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Factor Analysis

A

Factor analysis - developped in 1904-05 by Spearman
• Original context; interested in understanding the structure of human intelligence (what are its domains, how are they arranged, how can we measure them?)
• Input data: correlation matrix and st dev
• Goal: does the pattern of correlation is consistent with theoretical predictions about what they should measureà
• Widely used in assessment research

Statistical technique to address questions similar to those in the KABC
• We have 8 subtests who are supposed to measure 2 domains (divided in 3 and 5)
○ Do the data support those expectations?
○ The data is the correlation matrix (8x8) (=rij) and the SDi
○ Latent variables = simultaneous and sequential processing - NOT directly measurable / observable except through the observation of tasks that are supposed to tap into those domains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Construct validity

A

usually means: do scores have any relevant interpretation for the theoretical domain that the test authors intended?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

2 methods of factor analysis

A

There are 2 methods of factor analysis
1. Exploratory (original from Spearman) (AKA EFA)
EFA - analyses unrestricted measurement models
All the observed measures are allowed to correlate with each factor

2. Confirmatory (developped after Spearman) (AKA CFA)
	1. Is part of a larger family called structural equation modelling CFA-  analyses restricted model The observed variables are allowed to be associated with only certain ones of the factors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Convergent validity

A

hypothesis that you have multiple measures that are supposed to tap the same domain will correlate

if that is true, we should observe high intercorrelations between the test scores, their subtests as well as the whole test
How high should those correlations be? There is no clear answers, depends on
Level of measurement of the scores
Scoring metric
Etc
In general, they should be closer to one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Discriminant validity

A

the measures are supposed to measure different domains will not be correlated - the 2 different factors we are measuring together are not supposed to be correlated if they are really 2 distinct factors

Correlation between the 2 latent variables (1 and 2) - will be relevant for discriminant validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Factor loading

A

Correlation of a task/score on one of the factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the naming fallacy

A

Suppose that some factor model has been established to be consistent with the data
• It doesn’t mean that the name assigned to the factors by the researchers are adequate

Example: sequential processing in the KABC (name of one of the factors, underlying 3 tasks)
• All 3 of those tests involved immediate recall ONLY
Maybe then an appropriate name for the factor would be Short Term Memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Covariance evidence

A

Refers to external validity
Coefficient represented by rXY
Designates a correlation between scores on subtest X and scores on an external variable Y (Y is not just another test)
Y is something that the test SHOULD measure, we expect scores on X to correlate with scores on Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

External validity

A

how well does test X relate to variables in Y (real-world variable of interest)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Ŷ

A

• Ŷ = predicted score (score on Y generated from test X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Regression

A

Scores on test X and scores on external variable Y that the test is supposed to predict
The computer will fit the regression line (of best fit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Multiple regression

A

2 tests (X1 and X2) used BOTH to explain Y - multiple regression (2 predictors or more)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Generalized regression

A

could have several Y variables that should be predicted by multiple X scores (generalized regression)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Concurrent validity

A

If scores on X and Y are collected at the same time, that is called concurrent validity
Ex: elaborating a test with a bunch of employees while recording their performance at the same time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Predictive validation study

A

get scores on X and wait before getting scores on Y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

2 components of the equation that generates a predicted score on Y

A

Slope of the regressions line

Intercept of Y (when x=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

2 components for the CI around Ŷ

A

SEest and Z score for the desired CI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does SEest (standard error of estimate) represent

A

Describes the variability on the actual scores on Y around the regression line
AKA the st dev of actual scores on Y around the regression line
As the Y scores get closer to the regression line, SEest will get smaller

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

2 components needed to make SEest

A

SDy ( = st dev of scores on Y (error variance in Y scores))

rXY (=validity coefficient)/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

If rXY = 1, then SEest =?

A

then SEest would be 0, because ALL the points would fall on the regression line

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

If rXY = 0, then SEest = ?

A

then SEest will by = SDY
AKA all variation observed in Y is error
However if the coefficient is 0, something is wrong (also if it’s 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does rXY represent BEFORE and AFTER being squared, when we calculate SEest?

A

BEFORE: rxy is the slope of the regression line (predicted amount of change in Y in st dev units given a change in x of 1 full st dev)

AFTER: it’s the proportion of variance that is shared by X
(1- Squared rxy) is the proportion of variance not shared by X

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

How do we interpret a 95% CI around a value of Ŷ?

A

In the pop, 95% of CI constructed this way would include the person’s actual score of Y (not predicted Y), BUT, within that interval, there is no guarantee

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Explain why “Score reliability limits predictability”

A

• If x will predict some external variable, how well it does so is limited by its precision (reliability)
if scores on X are imprecise, they will be unable to predict anything

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

By what is limited the theoretical maximum absolute value for the correlation between X and Y (AKA rXY)?

A

the correlation between X and Y is limited by (cannot exceed) the square root of the product of their two respective score reliabilities (rXX and rYY)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

In which occasion can we say that a Pearson correlation can range from -1 to 1?

A

• The fact that a Pearson correlation can range from -1 to 1 is ONLY possible if the scores on X and Y are perfectly reliable (extremely rare)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What can we evaluate when there is no single variable for Y, but many different possible variables?

A

we can still evaluate convergent validity
• We can also evaluate discriminant validity
• We can’t really evaluate divergent validity - it doesnt exist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is the Jingle Jangle fallacy?

A

Jingle: false belief that if 2 things have the same name, they are the same thing
• Ex: two tests named to be measuring depression might not measure the exact same thing about depression, they may not even measure depression at all - they may provide a super low correlation between their scores

Jangle: just because tests have different names does not mean that they measure different things
• Ex: one test measures self-esteem and another measures interest in gardening - the correlation between the scores of those tests is 0.9 - indicating a really high correlation and thus suggesting that they measure the same thing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Common method variance
AKA method effects

What is the method of a test?

A

The method for a test is the method of deriving scores
• What is the source of information?
• How does that information get processed?

The method used to collect the data itself can have a systematic influence on the scores

A way to separate the variation coming from the method of measurement from the variation that comes from variation in the construct

What are methods of measurement?
• Self-report (risk of distortion of answers)
• Observational
• Archival (AKA records - risk of issues in accuracy of the datasets)
Some methods induce some systematic effect - especially self-report, where people tend to subtly change their answers due to demand characteristics
Some tests have validity scales (like the MMPI) to detect any potential voluntary distortion of answers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Multitrait Method Matrix

A

A way to separate the variation coming from the method of measurement from the variation that comes from variation in the construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What types of error can be introduced in methods?

A

• Source of information
○ Parents
○ Teachers
○ Ex: a child is referred to a psychologist because of behavioural problems at school
§ The parents and teacher will be asked to answer the questionnaire instead of the child - can induce perception errors
§ Teachers are keen reporters of conduct/behaviour problems - they notice internalization/anxiety types of problems less
§ Parents are better at reporting internalization than teachers
□ Thus the source of the info depending on who answers the questionnaire will impact the scores
• Method of administration
○ On computer vs paper/pencil can have an impact

41
Q

Test X1 and X2, from different authors - administered on the same sample
Assumed to measure the same construct
Both tests are based on the same method of measurement - self-report
The correlation between the scores =0.6
• What is explaining that high correlation value?

A

○ Is it due because the construct is the same being measured?
○ Or because the method is the same?
• Having the same method to obtain scores can influence the correlation and make it seem like the scores are more correlated than they really are, therefore the meaning of that 0.6 correlation is unclear
• Is this correlation coefficient evidence for validity? We don’t know, and there is not really a way to know

42
Q

Test X1 and X2, from different authors - administered on the same sample
Assumed to measure the same construct
Both tests do not have the same method
The correlation between the scores =0.1

A

• The common method CANNOT inflate the correlation between them, therefore we have more assurance that the correlation is due to the construct (however, since in that second case it’s very low therefore the constructs doesnt seem to be related)

43
Q

Test X1 and X2, from different authors - administered on the same sample
Assumed to measure different constructs
Both tests are based on the same method of measurement - self-report
The correlation between the scores =0.6
• Does that result speaks against discriminant validity, or is it due to the methods being the same?

A

We CANT know - since both methods are the same

44
Q

Test X1 and X2, from different authors - administered on the same sample
Assumed to measure different constructs
Both tests are based on different methods of measurement
The correlation between the scores =0.1

A

• Consistent with discriminant validity since the effect cannot be due to the methods being the same, and the correlation is low as expected for 2 tests that don’t measure the same construct

45
Q

If a correlation between 2 constructs is high and is evidence for convergent validity, what inference we should AVOID making about that result?

A

that the construct that both methods seem to be measuring is the ACTUAL INTENDED construct we wanted to measure; convergent validity indicates that both tests are measuing A construct, but it does not tell us WHICH

46
Q

What information can we get from MTMM matrix?

A

With this matrix, we can get information about:
• The reliability coefficients (bold coefficients)
• Convergent validity (between scores from 2 tests that are supposed to measure the same construct AND are based on different measurement methods) (coefficients of .18 and .60 - called monotrait, multimethod correlations)
• Discriminant validity (between scores from 2 tests that are NOT supposed to measure the same construct AND are based on different measurement methods) (coefficients of .17 and .09)
• Common method variance: the higher these coefficients are, it means that this measurement method is exerting more systematic influence on the scores (coefficients of .50 and .15 - called monomethod)

GO CHECK the matrix in notes - know where the correlations are

47
Q

monotrait, multimethod correlations

A

scores from 2 tests that are supposed to measure the same construct AND are based on different measurement methods

48
Q

monomethod correlation

A

different constructs, same methods

49
Q

mono-method bias

A

Measuring many outcomes with only one method - the method will influence the systematic effects observed in the scores of the outcome

50
Q

Response process evidence

A

(ex: when someone wants to know how ppl solve a problem - will give a problem to ppl and ask them to think out loud to see how they think - this corresponds to observing response process)

51
Q

Classic definition of validity + problems that it entails

A

“the extent to which a test measures what it purports to measure”

What it measures and how well it does it

52
Q

Classic definition of validity + problems that it entails

A

“the extent to which a test measures what it purports to measure”
What it measures and how well it does it

1. Validity is a property of tests rather than of test uses or test score interpretations
	○ Is true as long as the validation data supports the purpose of the test and as the test is used specifically for that purpose
2. In order to be valid, test scores should measure some purported construct directly
    3. Score validity is, to some extent, a function of the test author's or developer's understanding of whatever construct she/he intends to measure

2 and 3 are
• Tenable only for tests that measure behaviour linked to psych constructs (ex: memory, speed, etc)
• Not tenable for:
○ Tests for multidimensional/complex theoretical constructs (ex: intelligence or self concept)
○ Tests developed on the basis of a strictly empirical relationships between scores or external criteria (ex: MMPI)
Techniques whose purpose is to reveal covert/unconscious aspects of personality (ex: projective tests)

Other issues:
• Tests may be reliable but not valid
• Test scores don’t always reflect what the author intended to measure
• Result: tests promise more than they can offer

53
Q

Publication of the Technical Recommendations for Psychological Tests and Diagnostic Techniques (1954)

A

• Classification of validity in 4 categories
○ Content
○ Predictive
○ Concurrent
○ Construct
• 1974: reduced to 3 aspects of validity (AKA tripartite view of validity)
○ Content
○ Criterion-related
○ Construct
• Definition also changed: appropriateness of inferences from test scores or other forms of assessment

54
Q

2 new/current definitions of the term construct

A

Integration of almost all forms of validity evidence as construct validity
• Re-definition of the term construct in 2 ways:
○ Designate the traits, processes, knowledge stores, characteristics whose presence and extent we wish to ascertain through the specific behaviour samples collected (AKA what the test author sets out to measure)
○ Designate the inferences that may be made on the basis of test scores (AKA an interpretation of test data based on preestablished theoretical/empirical relationships between the scores and other variables)

55
Q

Chronbach’s 2 types of validity

A

• Logical
• Empirical
Both together: nomological net (interrelationships between and among theoretical and observable elements that support a construct)

56
Q

Embreston’s 2 aspects of construct validation research

A

• Construct representation: identifying the theoretical mechanisms that underlie task performance (goal = task decomposition)
○ Examines underlying processes constructing the construct for everyone
○ AKA what we are measuring
• Nomothetic span: network of relationships of a test to other measures
○ Examines individual differences between test takers
AKA what inferences we can make after measuring that thing

57
Q

Messick’s and Chronbach’s fundamental views of validity, compared with the Testing Standards

A

• Messick: construct validity as the unitary concept that links all evidence for the meaning of test scores together
Cronbach: validity of test scores needs to be demonstrated by a plausible argument backed by empirical/logical inferences

* Construct: the concept/characteristic that a test is designed to measure
* Types of evidence for the validation argument can be determined by developing a set of propositions/claims that support the proposed interpretation of the particular purpose for testing
58
Q

Test Content evidence

A

Tests designed to sample behaviour that can be linked +/- directly to the inferences we wish to make based on their scores
• AKA content- or criterion-referenced tests
• Used in educational/occupational settings, but also in clinical (when we need to know if person is able to complete X task)
• Items: either sample content domain or assess ability/competence in some skill

59
Q

Once the content domain has been set, content validation procedures involve reviewing the test content from 2 perspectives

A
  1. Relevance of the content sampled to the domain
    1. Representativeness of the content samples with regard to the specifications about the domain that it is designed to cover

Then, specifications about how the relevance/representativeness of the content was established should be specified:
• How experts were selected (their qualifications)
• Process to obtain their judgement
Extent of the agreement among them

60
Q

Name 3 methods to assess response process validity

A

• Protocol Analysis
○ Examinees are asked to describe their cognitive strategies as they complete a task

• Additional Methods
	○ Data on the timing of item responses: during computerized administration, knowing the timing of responses can tell us a great deal in the degree of difficulty and rule out guessing
             ○ Analysis of the criteria applied by scorers: to see whether its accurate - therefore to see if the test responses are evaluated on the basis of the processes that were assumed by the scoring rubrics (same as protocol analysis, but from different point of view)
61
Q

Where can evidence about internal structure be found?

A

Evidence is found in
• Interrelation between item responses (for the same domain)
Interrelation between subscales / subtests (for multiple domains)

Internal Consistency and Other Indexes of Score Reliability
• Score reliability in itself can be seen as preliminary evidence that a trustworthy measure of a sample of behaviour has been attained
○ Can be indirect evidence for validity
○ Especially true for unidimensional concepts
○ Indications of reliability:
§ Alpha coefficients
§ Inter-rater reliability
Test-retest score reliability

62
Q

Goals and types of Factor Analysis techniques

A
Factor Analytic (FA) Techniques
Goals of it:
	• Reduce the number of dimensions needed to describe data derived from a set of measured variables
	• Investigate the structure that accounts for the interrelationships between the variables, to better understand the information they provide

2 methods for FA
• Exploratory Factor Analysis (EFA): to discover which factors/dimensions underlie the measures under analysis
• Confirmatory Factory Analysis (CFA): test hypotheses, or confirm theories about factors that are already presumed to exist

63
Q

Exploratory factor analysis

A

A correlation matrix needs to be created first: table that displays the intercorrelations among the scores obtained by a sample of individuals on a wide variety of tests (or subtests, or items)
The interpretation of the results of an EFA depends on which scores are included in this matrix, and from which sample those scores were derived - often overlooked

The result of EFA is a factor matrix: table that lists the loadings of each one of the original variables on the factors extracted from the analyses
Factor loadings: correlations between the original measures in the correlation matrix and the factors that have been extracted

Once we have the Factor Matrix, we identify the Factors that account for the most variance on the dataset and try to use inductive logic to label them

64
Q

Correlation matrix (EFA)

A

table that displays the intercorrelations among the scores obtained by a sample of individuals on a wide variety of tests (or subtests, or items)

65
Q

Factor matrix (EFA)

A

table that lists the loadings of each one of the original variables on the factors extracted from the analyses

66
Q

Structural Equation Modeling (SEM) techniques + advantages

A

To test the plausibility of hypothesized interrelationships among constructs as well as the relationships between constructs and the measures to assess them
• Main idea: The relationships obtained with empirical data on variables that assess the various constructs are compared with those predicted by the models
○ The relation between the data and the models is evaluated
§ With goodness-of-fit statistics

• Advantages of SEM
	○ Is based on analyses of covariance structures (patterns of covariation among constructs) that can represent the direct and indirect influences of variables on one another
	○ Typically uses multiple indicators for both the dependent and independent variables in models and thus provides a way to account for measurement error in all the observed variables
67
Q

How is Confirmatory Factor Analysis (CFA) different from EFA?

A

Differs from EFA because it involves a priori hypotheses of one or + models of the relationships between test scores and the factors/constructs it is designed to assess
• The direction/strength of relationships estimated by various models are tested against results obtained with actual data for goodness of fit
○ Using computer programs such as LISREL
Are abundant in psych research and in test manuals as validity evidence

68
Q

How do we establish evidence on relation to other variables?

A

Convergence and Differentiation
Correlations Between Measures
• Comparing the scores of a test with other instruments that measure the same construct to assess validity
• We can compare the whole test or simply correlate subscales together
• AKA intertest correlations
○ Done with the different versions of a test when re-norming

69
Q

Convergent evidence of validity for evidence based on relations to other variables

A

consistently high correlations between measures designed to assess a given construct

70
Q

Discriminant evidence of validity for Evidence Based on Relations to Other Variables

A

consistently low correlations between measures that are supposed to differ

71
Q

Multitrait-Multimethod Matrix

A

Validation strategy that requires the collection of data on 2 or + distinct traits by 2 or + methods (ex: self-report and behavioural observations)
• Calculating their intercorrelations, then represented in a matrix that displays
○ Reliability coefficients for each measure (parenthesis)
○ Correlations between scores on the same trait assessed by different methods (bold)
○ Correlations between scores on different traits measured by the same methods (italics)
○ Correlations between scores on different traits assessed by different methods (plain text)

Go check the matrix to understand where the correlations are

72
Q

Method variance

A

variability that is related to characteristics inherent in the methodologies of tests

73
Q

Age Differentiation as Evidence Based on Relations to Other Variables

A

When test results correspond to well-established developmental trends
Seen as a source of validity evidence, one of the oldest (ex: Binet scale)

74
Q

Experimental Results as Evidence Based on Relations to Other Variables

A

When experiments use psychological tests scores as the dependent variable to see the effects of experimental interventions
• Mostly through pre-post test differences
• Ex: increase in scores on a test of conceptual development in children after following an enrichment program
○ Demonstrates the validity of the scores as well as the efficacy of the program

75
Q

Test-Criterion Relationships as Evidence Based on Relations to Other Variables

A

When test scores are used to make decisions about people’s lives, they need to be assessed with more than reliability and validity measures
• In that case, validity measures need to address the value that the scores have outside of their primary nature; they need to assess the various other factors that will also influence the decision that will be made out of the scores

76
Q

Criterion and criterion measures

A

Criterion: what we really want to know
Criterion measures: indexes of the criteria that tests are designed to assess or predict and that are gathered independently of the test in question
• We are looking for a link between the predictors (AKA test scores) and the criteria

The criterion measures can be…
• Naturally dichotomous (graduating vs dropping out)
• Artificially dichotomized (success vs failure)
• Polytomous (diagnoses of anxiety vs mood disorders vs dissociative disorders)
• Continuous (grade point average)
The nature of the criteria depends on the decisions that will be made with the test scores

77
Q

Hit rates

A

if…
• The criterion measure is dichotomous or categorical (membership in groups), then the validity is examined in terms of hit rates
○ Hit rates: the % of correct decisions that were made with the test scores
• The criterion measure is continuous, then the validity can be examined in terms of correlation coefficients

78
Q

Criterion-Related Validation Procedures

2 types of criterion-related decisions

A

• Determining characteristics of a person at the time of testing (concurrent)
• Predicting future performance or behaviour (predictive)
Regardless, both types derive from scores obtained about current behaviour

79
Q

Concurrent and predictive validation evidence

A
  • Concurrent validation evidence: gathered when indexes of the criteria that test scores are meant to assess are available at the time the validation studies are conducted
    • Predictive validation evidence: require gathering data on the predictor variable (test scores) and waiting for criterion data to become available so that both sets can be correlated
80
Q

Specificity and sensitivity of a test

A

○ Sensitivity (probability that the test will correctly detect the condition)
○ Specificity (probability that the test will correctly detect the absence of condition)

81
Q

Ideal design for a predictive validation study involves:

A
  • Testing an unselected group of applications within an ability test / battery
    • Admitting them without regard to their test scores
    • Waiting until criterion measures of performance become available
    • Then make correlations between the pre-admission test scores and criterion measures
82
Q

Regression line + 2 important components

A

Regression line: line that best fits the bivariate data, which minimizes errors in predicting X and Y
• Two important components
○ Y intercept
○ Slope of the line

83
Q

Square of rxy = 0.757 (AKA coefficietn of determination)

How to interpret

A

• Means that 75.5% of the variance in Y is associated with the variance in X (the rest of the variance does not come from the relationship between the 2 variables)

rXY squared is a coeffcient of determination
When it’s not squared it’s just the slope of the regression line

84
Q

SEest

A

Statistic that expresses, in the scale used for the criterion measure, the error in predictions that are based on imperfect correlations

85
Q

The interpretation of SEest assumes that:

A
  • The predicted criterion score (Ŷ) is the average value in a hypothetical normal distribution of all possible criterion scores for the applicant in question
    • The SEest is the st dev of that distribution
86
Q

Interpretation of a CI around Ŷ for the factory example

A

there are X% chances that the applicant will produce between N and N widgets per hour on average

87
Q

Issues Concerning Criterion-Related Validation Studies: criterion contamination

A

• When the people that determine the criterion standing of individuals have access to scores on the test that is used as a predictor (ex: doctors would know that the test observed diagnosed dementia, and put them in the criterion dementia accordingly)

88
Q

Techniques (3) to deal with multiple predictors relationships

A

• Multiple regression techniques (many predictors combined in the regression equation, each having some weight on the line)
• Multiple regression equations (linear regression but with many predictors, their weight being determined by its correlation with the criterion)
○ A multiple correlation coefficient R can be computed to determine the best weighted combination of test scores and the criterion
○ Cross-validation (replication of predictor-criterion relationships on separate samples to account for sample-specific error) is sometimes needed, it will most likely cause a reduction of R (shrinkage) due to the reduction of sample-specific error
• Profile analysis: establishing a cut-off score for each predictor and rejecting all participants whose scores fall below it
○ This method often fails to take into account the reliability of scores
○ Can lead to the rejection of disadvantaged participants unfairly who could likely do well with training / more opportunities to succeed

89
Q

What is the problem with using samples for which the criterion is already known (ex: people who are already employed and for whom we already know their productivity levels)

A

• AKA they use concurrent methods for predictive studies
• Because they have already been selected, the range of scores of the predictor variable with always be narrower (in a sample where not all applicants would be selected for the job, the range would be much larger)
○ Result: reducing the range of one of either variables reduces the correlation coefficient
§ A formula has been devised to adjust for that situation

90
Q

Moderator variable

A

Moderator variable: any characteristic of a subgroup of people in a sample that influences the degree of correlation between 2 other variables
• ANY demographic, psychological variable can influence the correlation between the predictor and the criterion
Solution: divide the sample into smaller groups according to that variable and calculate the coefficients separately

91
Q

When the difference between subgroups is systematic across studies, it can have 2 consequences:

A

• Differential validity: differences in the size of the correlations obtained between predictors and criteria for members of different groups
○ Suggests that the test scores predict more accurately for the group with the larger coefficient
○ The slopes of their regression lines will also be different, thus why we also call this slope bias
• Differential prediction: occurs when test scores under predict/overpredict criterion performance of one group compared to the other
○ AKA intercept bias - because the Y intercept of the regression line will differ
○ Solution: use different regression lines/equations for different subgroups, use different cut-off scores, use different subgroup norms (highly contested)

92
Q

Magnitude of the predictive validity index depends on which factors?

A

• The composition of the validations samples (size/variability)
• Nature/complexity of the criterion
• Characteristics of the test
• Interactions among all these
Test users need to consider those aspects carefully before interpreting validation studies

93
Q

Meta analyses

A

Rely on a series of quantitative procedures that provide for the synthesis and integration of the results obtained from the research literature on a subject
• Can reduce the errors stemming from individual studies

94
Q

Selection decisions

A

those that require a choice between 2 alternatives (ex: accept or reject a candidature, presence or absence of a diagnosis, etc)

95
Q

Screening

A

preliminary step in a selection process (to separate those who are worth evaluating further from those who are not)

96
Q

Placement decisions

A

involve assigning individuals to separate categories or treatments based on scores coming from a single regression equation about a single criterion

97
Q

Classification decisions

A

nobody is rejected, but individuals must be differentially assigned to distinct categories based on multiple criteria - multiple predictors are thus required, and their relationships to the criteria must be established through different regressions

98
Q

What happens when the predictor predicts multiple criteria equally well? (in the context of making decisions using test scores)

A
  • Discriminant functions analyses: involve the application of weighted combinations of scores on the predictors to determine how an individual’s profile matches the profile of individuals already placed in the groups - do not allow for the prediction of levels of success
    • Synthetic validation: relies on detailed job analyses that identify job components and their weight, as well as previous regression coefficients for various tests, to create a new battery that will predict performance on that job