Introduction to Psychometrics & Validity (Week 2) Flashcards
state what it is meant by the key term - ‘construct validity’
the degree to which evidence and theory support the interpretation of test scores
(American Educational Research Association)
construct is an ongoing process, meaning…
3 points
1) it takes many years
2) many studies must be done
3) new additions to the theory
how do we measure psychological constructs, and why do we do it that way?
- self-reported measures (e.g. - questionnaires)
- this is because we don’t have hard outcome measures like we do in other sciences
state what it is meant by the key term - ‘construct’
a construct is a psychological phenomenon (e.g. - SE, neuroticism, SDT)
state 3 benefits of using questionnaires
1) cost effective
2) time effective
3) easy to administer
what is the issue with the use of questionnaires? (4 points)
- to what extent can we trust the data obtained?
- confusing word choices
- scaling (more or less needed?)
- no reading of instructions if too long (too much effort)
what 3 questions do we need to ask ourselves if there is construct validity?
1) does the data obtained align with the theory/evidence?
2) what is the meaning of the test scores?
3) can we trust the data obtained?
give 2 examples to why we may not trust obtained data
1) did the respondent answer 100% accurately
2) were you measuring the correct construct or another one by accident?
state, in order, the 6 aspects of construct validity
1) content validity
2) substantive validity
3) structural validity
4) external validity
5) consequential validity
6) generalisability validity
state what it is meant by the key term - ‘validity’
Messick., 1989
an overall evaluative judgement of the degree to which empirical evidence and the theoretical rationales support test scores and other modes of assessment (Messick., 1989)
what 3 things are test scores a function of?
1) items/stimulus
2) person responding
3) assessment context
what did (Cronbach., 1971) say about what needs to be valid?
‘what needs to be valid is the interpretation of the test score’ (Cronbach., 1971)
state what it is meant by the key term - ‘score’
Messick., 1995
any coding or summarising observed consistencies or performance regularities on assessment devises (Messick., 1995)
what did (Cronbach & Meehl., 1955) say about the comprehensiveness of construct validity
“construct validity is based on an integration of any evidence that bears on the interpretation or meaning of the test scores” (Cronbach & Meehl., 1955)
state what it is meant by the key term - ‘contstruct representation’
(Embreston., 1983)
put simply, Construct representation is concerned with identifying the theoretical mechanisms that underlie responses (Embreston., 1983)
the 6 aspects of validity apply to all psychological and educational measurement, including performance assessment…
taken together, they provide a way of addressing the multiple and interrelated validity questions that need to be answered to justify some interpretation and use (Messick., 1995)
what are the two threats to construct validity ?
1) construct underrepresentation
2) construct irrelevant variance (difficulty + easiness)
state what it is meant by the key term - ‘construct underrepresentation’
the assessment is too narrow and fails to include important dimensions and facets of the construct (Messick., 1995)
state what it is meant by the key term - ‘construct irrelevant variance’
the assessment is too broad, containing excess variance associated with other distinct constructs as well as method variance (such as guessing) that affect responses irrelevant to the interpreted construct (Messick., 1995)
state what it is meant by the key term - ‘construct irrelevant difficulty’
aspects of the task that are extraneous to the focal content make the task irrelevantly difficult for some responders (e.g. - word choice)
state an issue with ‘construct irrelevant variance’
can cause bias (Holland & Wainer., 1993)
state what is it meant by the key term - ‘construct irrelevant easiness’
occurs when extraneous clues in item or task formats permit some individuals to respond correctly or appropriately in ways irrelevant to the construct being assessed (Messick., 1995)
state and explain an issue with ‘construct irrelevant easiness’
can occur when some test material is familiar to others which leads to incredibly high scores for affected individuals (Messick., 1995)
what is the primary measurement concern with respect to adverse consequences ?
the primary measurement concern is that any negative impact on individual’s or groups should not come from any test invalidity, such as construct underrepresentation or construct irrelevant variance (Messick., 1989)
low scorers should not occur because the assessment is missing something relevant to the construct
low scorers should not occur because the assessment is containing something irrelevant to the construct
state the (Wiggins., 1993) and (Messick., 1994) statements on ‘construct irrelevant variance’
construct irrelevant variance is important in all educational and psychological measurements, including performance assessment (Wiggins., 1993)
however, what constitutes construct irrelevant variance is a tricky and continuous issue (Messick., 1994). typically, many skills are higher order in sports performance
state the (Cacio et al., 1991) and (Gottfredson., 1994) statements on ‘construct irrelevant validity’
another issue arises when construct irrelevant variance is capitalised upon to produce desired outcomes (Cacio et al., 1991) eg - score adjustments for minority groups
recognising that adjustments distort the meaning of the construct can help psychologists ensure it doesn’t happen (Gottfredson., 1994)
what does (Kane., 1992) say about scores and construct validity?
“almost any kind of information about a test can contribute to an understanding of score meaning, but the contribution becomes stronger if the degree of fit of the information with the theoretical rationale underlying the score interpretation is explicitly valued” (Kane., 1992)
potential and actual consequences of test use must be considered for 2 reasons (Gallikson et al., 1950), what are they?
1) applications of likely outcomes may guide one to look for side effects
2) such anticipation may alert one to take timely steps to capitalise on some positive effects and to forefall negative effects
what did (Messick., 1995) say which contradicts (Gallikson et al., 1950) about potential and actual consequences?
unintended consequences are also strands in the constructs nomological network that needs to be taken into account in the constructs theory, score interpretation, and test use (Messick., 1995)
what is content validity ? (2 points)
‘what should the questionnaire include?’
measures the relevance, representativeness, and the technical quality of the measure
state 3 items that negatively affect content validity
1) unclear, overly complicated, overly-colloquial questions
2) no variability in responses
3) double barrelled questions
what are the 3 steps in achieving content validity ?
1) review literature to determine:
- construct conceptualisation, dimension of construct, current measures
2) develop preliminary questionnaire based on review
- items, scaling, instructions, presentation, language
3) obtain feedback from experts in the field
- all aspects assessed?
- clear instructions?
what is a key issue with obtaining content validity?
the specification of the boundaries of the construct that need to be assessed (Brunswik., 1956)
what did (Brunswik., 1956) come up with to overcome the issue of not knowing what to assess for content validity? (3 points)
Ecological Sampling (Brunswik., 1956)
- an in depth analysis of the domain
- intent is to ensure that all aspects of the domain are covered
what is substantive validity? (2 points)
1) how do respondents interpret the questionnaire and its items?
2) is there an alignment between what’s supposed to be measured and the respondents interpretations?
- e.g. are members of your team on the same page?
how is substantive validity obtained?
1) test preliminary questionnaire with sample from target population
- what do you think this question is asking?
- did the scale allow you to answer properly?
- did all questions apply to you?
(Messick., 1995) stated that there are 2 important points to consider when talking about substantive validity, what are they?
1) the need for tasks providing appropriate sampling of domain processes in addition to traditional coverage of domain content
2) the need to move beyond traditional judgement of domain content to accrue empirical evidence from responders
what did (Loevinger., 1957) say about substantive validity?
“the substantive aspect adds empirical evidence from the domain to content validity” (Loevinger., 1957)
what bridges content and substantive validity ?
representativeness (Messick., 1995)
what is structural validity?
does the data fit/align with the theoretical propositions of that construct?
date = theory/evidence?
state and explain 3 points to how we achieve structural validity ?
1) larger sample of your participants complete your questionnaire
2) quantitive analysis of data
- e.g.) factor analysis, relationships between questions
- does data ‘fit’ the theory?
3) does the measure need refinement?
- be careful, if not removed in content validity, then surely it was important to include?
- if we remove it, we may miss something theoretically relevant
what did (Loevinger., 1957) say about structural validity ?
structural validity should aid the selection of tasks, and also the rationale development of construct-based scoring criteria (Loevinger., 1957)
what is external validity?
how does your data relate to the data from other related constructs?
date = theory/evidence?
how do we get external validity? (3 points)
1) participants complete your measure and other relevant measures
- e.g.) personality and self-efficacy
2) is the data from your measure related to the other measures as expected?
- e.g.) positive, negative, no relationship?
3) come back to the theory and/or previous research
both convergent and discriminant correlation patterns are important for external validity. what are ‘convergent patterns’?
the convergent pattern indicating a correspondance between measures of the same construct (Campbell & Fiske., 1957)
both convergent and discriminant correlation patterns are important for external validity. what are ‘discriminant patterns’?
the discriminant evidence is particularly critical for discounting plausible rival alterations to the focal construct interpretation (Messick., 1995)
what is generalisability validity ?
to what extent do the results generalise across populations, contexts, or tasks
how do we get generalisability validity? (3 points)
1) participants from samples of various populations complete your measure and other measures
2) does the data align with your original data? why? why not?
3) testing generalisability is an ongoing process
- you don’t have it, you are on a low to high scale with it
what did (Feldt & Brennen., 1989) say about generalisability validity?
“the limit of score meaning affects generalisability across tasks. Error in the sampling of tasks, occasions, and scorers underlie traditional reliability concerns” (Feldt & Brennen., 1989)
what does (Wiggins., 1993) say about generalisability validity?
“because of the time required to achieve generalisability, it may be depicted that a trade off occurs between validity and power of interpretation. such a conflict signals a design problem that needs to be carefully negotiated in performance assessment” (Wiggins., 1993)
what is consequential validity?
what are the implications that may result from using a questionnaire?
1) can the measure be used for a ‘basis for action’?
2) are there any potential consequences of test use?
- follow-up appointments
- use data from other sources
what is Bull’s mental skills questionnaire? (2 points)
1) sports psychologists could use this to guide their treatment with an athlete
- i.e. constructs with the worse test score from the athlete can be the initial focus of an intervention