Wk 5 - Validity Flashcards

Question

Variance in test measurement is made up of what two components?

Answer 1

Valid measurement | Construct irrelevant variance

Answer 2

Construct under-representation | Valid measurement

Answer 3

Good for PR - people can see the point They accept test and take it seriously, less missing values But may have terrible actual validity - Just because it feels like driving a car, doesn't mean it tells us about RL driving behaviour

Answer 4

Getting experts to rate each item for its relevance to the construct - e.g. exam questions for course content Getting those experts to make judgements on whether particular lectures were over- or under-represented.

Answer 5

Ask about known correlates of academic success: hours spent studying, assignment marks, motivation, GPA, questions based on other prerequisite courses, number of lectures/tutorials attended

Answer 6

Create hypotheses regarding how your measure ought to perform if it is valid Then design and run studies to test these hypotheses

Answer 7

Non-random attrition: certain types of people dropping out of your longitudinal study - e.g. those too sick or not very sick dropping out of study Self-selection: only certain people in the sample in the first place - we can't test randomly in RL

Answer 8

``` Judgment regarding how adequately a score on a test can be used to infer an individual’s most probable standing on some measure of interest (the criterion) Method of contrasted groups Concurrent Predictive Incremental Convergent ```

Answer 9

Establish criterion validity by determining whether test scores of groups of people vary as expected Eg groups we think will score high or low – does the test tell them apart? Clinical group vs non-clinical controls (e.g. dental phobia patients vs controls for dental phobia questionnaire) Experts vs novices for a skill or ability test

Answer 10

Validate schizophrenia test by finding it can tell apart those with/out BUT then discover that people with schizophrenia were originally diagnosed using my test (validity test is circular) Zuckerman sensation seeking scale: validated by comparing scores with a risk-taking behaviour scale (the criterion), But virtually the same items appeared in both test and criterion ie many questions directly ask if you like to take risks

Answer 11

Giving us a much smaller correlation than we would get from full range

Answer 12

Depends entirely on the context .2 is awesome for crash risk – most people only crash every ten years, and only small portion of any sample will have had a crash in eg past yeardepends entirely on the context But, if correlating a new test of intelligence with an older one, might expect more like .80 (no excuse for it to be smaller if it’s measuring the same thing)

Answer 13

Subset of criterion validity | Where test and criterion are measured for each person at the same time/in same session

Answer 14

Subset of criterion validity Where the test is trying to predict what the criterion will be at some future time Speed test predicts crashes over next year

Answer 15

Type of criterion validity Can be predictive or concurrent How much each individual predictor ADDS to predicting the criterion in addition to the effect of other predictors

Answer 16

Predict bungee jump: Sensation-seeking, Past risk-taking behaviour, Susceptibility to peer pressure, Fear of heights Alone might have poor criterion validity, using together (e.g. combined using Multiple Regression) improves it Eg crash risk: Speeding propensity; hazard perception skill; fatigue; km driven/wk; drunk; tailgaiting propensity; traffic violations; distraction; years experience; age (last two correlate highly, so using both doesn’t add much value – doesn’t change you validity coefficient)

Answer 17

Does it correlate with similar things? Convergent and criterion are not mutually exclusive – many things might be equally described by either term eg if anxiety measure is valid, we can expect it to correlate with established measures of anxiety

Answer 18

Does it not correlate with dissimilar things? e.g. new measure of depression does NOT correlate highly with validated measures of anxiety (shows it isn’t just a general measure of maladjustment and is specifically measuring depression, because depression and anxiety are different things)

Answer 19

Correlation between the test scores and the criterion scores

Answer 20

Reliability of each limits size of validity coefficient So, noise/measurement error in the data makes the correlation smaller, due to scores being more spread out Eg, imagine a perfect match of test onto criterion (correlation of 1) • Then introduce uncertainty/inaccuracy/unreliability into the tests scores… • Perfect correlation is spoiled by smearing the scatterplot • Same would happen if the criterion scores were unreliable

Answer 21

Is the internal structure as expected? Does it map onto theory? Multi-faceted (multiple subscales; heterogeneous)? Or no subscales, homogeneous?

Answer 22

Uses mathematical techniques to group the items into clusters/factors/components based on how well they correlate with each other Picks out the distinct clusters of items in our data based on their inter-correlations -tells us to what extent the constructs tie together

Answer 23

Umbrella term covering water we are measuring the right thing… Argued to cover all validity methods How well the scores on your test reflect the construct (i.e. the trait or characteristic) that your test is supposed to be measuring.

Answer 24

The standard against which the test is evaluated (e.g. actual driving speed was one of the criteria used to validate the speed questionnaire). Needs to be reliable, valid, relevant, and not subject to criterion contamination

Answer 25

Surgery competence simulator test: correlation between test score and patient outcome rating University admissions test, with GPA at end first year; Depression inventory with clinician rating of severity; Salesperson personality scale with amount of worthless crap sold to gullible people; Clerical aptitude with supervisor’s job performance rating; Creative thinking test with panel rating of product

Answer 26

No Measure with terrible face validity could still be great predictor, e.g. risk-taking and driving behaviours Could have psychometric validity sans content - e.g. GPA as predictor of course performance, rather than exam

Answer 27

Greater than .4, with larger being better

Answer 28

Are they as expected? ie it's been shown in a bunch of studies what the effect is supposed to be Eg new test of state anxiety: • Give them all the test, split into two groups (treatment/placebo) • If valid, intervention group should differ from placebo Eg hazard perception: subject 25 people to a validated training course and see if scores improve on the test compared with control group/ pre-training scores

Answer 29

Diffs in test scores as you might predict if your measure was valid? Eg everyone would agree that language skills improve with age

Answer 30

``` 10 core subtests arranged into 4 groups: • Verbal comprehension index • Perceptual reasoning index • Working memory index • Processing speed index Plus 5 supplementary tests for additional/replacement tests ```

Answer 31

Large, representative normative sample across 11 age groups (6 - 16 yrs) Mean = 100m, SD = 15 Excellent internal consistency, test-retest and inter-rater reliability Need diff of 7.58 IQ points (2 x SEdiff =3.79) for confidence that not due to chance Good content validity And empirical: predicts academic achievement, diffs between normal/special groups, confirmatory factor analysis

Answer 32

Internal consistency of the criterion Certain types of people dropping out of the sample between the original test and when the criterion is measured Certain types of people agreeing to be in the sample.

Wk 5 - Validity Flashcards

(56 cards)