Psychometrics: validity Flashcards
What is validity?
refers to whether or not a test measures what it intends to measure
aim of establishing validity
to be able to make accurate inferences from scores on a test and to give meaning to test scores
-indicates the usefulness of a test
relationship between validity and reliabiliy
if a test is not valid, no point in testing reliability
-if a test is not reliable, it is not valid
4 types of validity
- face validity
- content validity
- criterion validity
- construct validity
Face Validity
- when a test seems on the surface to measure what it is supposed to measure
- can have a good face validity but not really be a valid test
how face validity is measured
- researchers simply look at the items and give their opinion if the items appear to measure what they are trying to.
- least scientific
4 sectors of evaluating face validity
- readability
- layout and style
- clarity of wording
- feasability
disadvantages of face validity
- many dont consider this is a measure of validity at all
- does not refer to what is actually being measured rather than what it appears to measure
- determined through review and not statistical analysis
Content validity
- the degree to which a test measure an intended content area
- non-statistical
- Do the questions/items on a test make up a representative sample of the attribute the test is supposed to measure
how to reach content validity
- Specifying the content area covered by the phenomenon when developing the construct definition
- Writing questionnaire or scale items that are relevant to each of the content areas
- Developing a measure of the construct that includes the best (most representative) items from each content area
construct under-representation (aspect of content validity)
the test does not capture nb components of the ocnstruct
construct irrelevant-variance (aspect of content validity)
when test scores are influenced by things other than the construct the test is supposed to measure
How is content validity established?
- judgement by expert judges
- -content validity=number of relevant items/total number of items
- can also use statistical methods like factor analysis
Criterion validity
- how well a test score estimates/predicts a criterion behaviour or outcome, now or in the future
- eg. depression inventory
- easy for ability tests but hard for personality/attitude tests
Why would we be interested in using criterions to create a new measurement procedure?
- Create a shorter version of a well-established measure
- To account for a new context, location and/or culture
- To help test the theoretical relatedness of a well-established measurement procedure
concurrent validity
- the extent to which test scores can correctly identify the current state of individuals
- Measure concurrent criterion validity by correlating scores on our new test to scores on an already established test
predictive validity
- do scores on a test predict a future event successfully?
- the test is the predictor
- the future event is the criterion
How is Criterion Validity Evaluated?
- correlation coefficients
- coefficient of determination (the square of the validity coefficient)
- Standard error of estimate (SEE)- high see= greater deviation of criterion scores from predicted criterion scores (bad)
- success ratio (SR) - the proportion of predicted successes on the criterion that turned out to actually be successful
Construct validity
-It is something that we think exists, but is not directly observable or measurable
e.g., we can directly measure 10ml of water – water is directly observable and measurable
BUT we cannot directly measure 10ml of depression – depression is a construct, it is not directly observable and measurable
How do we measure constructs?
- we look at the relationship between the construct and other constructs
- What observable behaviours can we expect if a person has a high (or low) score on a test measuring this construct?
The relationships between one construct and others
- look for convergent validity evidence AND divergent/discriminant validity evidence
- for a test to have good validity, it needs to have both convergent AND discriminant validity evidence
Convergent validity
Scores on a test have high correlations with other tests that measure the similar constructs
e.g., Depression tests should correlate highly with tests of sadness, or anxiety
Discriminant validity (divergent)
Scores on a test have low correlations with other tests that measure different constructs
e.g., A questionnaire on racism should have little or no correlation with gender, for example
Criterion-groups validity
- groups that are expected to differ should score differently on tests
- e.g., people with autism should score differently on scale of empathy than those with high empathy (e.g. counsellors)
Validity for Criterion-Referenced Tests
- Criterion-referenced tests compare performance with some clearly defined criterion for learning
- Are often ‘high stakes’ tests – e.g., pass a test before you can practice in some discipline, like being an electrician etc.
- Measure proficiency in something – this ranges from no proficiency at all to perfect proficiency
Establishing Validity for Criterion-Referenced Tests
- Compare scores on a test before and after the program of instructions
- Compare scores on the test with scores on a test related to the criterion
factors affecting validity
- reliability (Can have reliability without validity BUT must demonstrate reliability before validity)
- social diversity (tests may not be equally valid for different social/cultural groups)
- variability