- Consistency in measuring a construct - A TEST IS ONLY AS VALID AS IT IS RELIABLE. - internal consistency - test-retest reliability - alternate forms

Assessment L2 - Reliability & Validity Flashcards by Mai Blank

What are sources of assessment error?

Measurement Error
Imperfect test validity - every test is associated with some error.
Sampling Error
Scoring/Admin Errors
Patient Variables
Test score = syndrome + measurement error + premorbid ability + drugs + effort + practice

How well did you know this?

Not at all

Perfectly

What is measurement error referring to?

Assessment at a particular time is simply a picture in time - a pathology can change, progress over time.

Trait vs state - measurement can be affected by ‘state’ eg. cognitive abilities will be compromised during a depressing state.

How well did you know this?

Not at all

Perfectly

What does sampling error refer to?

Error caused by observing a sample instead of the whole population

How well did you know this?

Not at all

Perfectly

What does scoring or administration error refer to?

intra-rater reliability -degree of agreement among repeated administrations of a diagnostic test performed by a single rater.
inter-rater reliability

How well did you know this?

Not at all

Perfectly

What does patient variables refer to?

task engagement/motivation to perform well

- educational/occupational/cultural/language/age factors

How well did you know this?

Not at all

Perfectly

What is the goal of assessment?

to maximise TRUE SCORE VARIANCE and minimise ERROR.
individual’s true score is conceptualised to be the average score in a hypothetical distribution of scores that would be obtained if the individual took the same test an infinite number of times.

How well did you know this?

Not at all

Perfectly

What are the sources of error in psych testing?

the CONTEXT - environment, test administrator, score, and reasons for which the test is being taking
the TEST TAKER - genuine? motivated?
the TEST ITSELF - unreliable?

How well did you know this?

Not at all

Perfectly

What is reliability?

Consistency in measuring a construct - A TEST IS ONLY AS VALID AS IT IS RELIABLE.
internal consistency
test-retest reliability
alternate forms

How well did you know this?

Not at all

Perfectly

What is internal consistency?

how consistent items within the test are at measuring the overall construct - cronbach’s alpha

How well did you know this?

Not at all

Perfectly

What are the ‘ground rules’ that are commonly assumed in assessment?

test administrators and scorers carefully select the appropriate instruments, suitable test environments, establish good rapport with test takers and administer and score the tests in accordance with standardised procedures.
test takers are also assumed to be properly prepared and well motivated to take the tests.

How well did you know this?

Not at all

Perfectly

What reliability considerations should you make when choosing a test/instrument?

determine the potential SOURCES OF ERROR
examine RELIABILITY DATA available on the instruments, and the type s of sample that they were obtained from
EVALUATE the data on reliability in light of all other attributes - question, normative and validity data, cost and time constraints
Select the test that promises to produce the MOST reliable scores for the purpose and population at hand.

How well did you know this?

Not at all

Perfectly

What is Cronbach’s Alpha?

A measure of internal consistency, can be described as correlation of the test with itself.
The alpha coefficient is a function of 2 factors:
1. NUMBER OF ITEMS in a test
2. RATIO OF THE TEST TAKER’S PERFORMANCE ACROSS ALL ITEMS OF THE TEST TO TOTAL TEST SCORE VARIANCE.

How well did you know this?

Not at all

Perfectly

What kind of test is cronbach’s alpha used for usually?

More commonly used for questionnaires than the WAIS - because for these, high items won’t be consistent with low items - due to difficulty gradient.

How well did you know this?

Not at all

Perfectly

What is cronbach’s alpha conceptually meant to represent?

It is conceptually meant to be the estimate of the reliability equivalent to the average of all of the possible split half coefficients that would result from all possible ways of splitting the test in half.

How well did you know this?

Not at all

Perfectly

What kinds of reliability methods are prone to practice effects?

Test re test and alternate form reliability

not

split-half and coefficiant alpha

How well did you know this?

Not at all

Perfectly

What does a cronbach alpha of less than 0.7 mean?

Low reliability - suggests that the scores one derives from a test may not be very trustworthy.

What cronbach alpha level is said to be ACCEPTABLE??

Possibly over 0.7, 0.8 to be safe.

What is validity.

Simply put - the extent to which a test measures what it is supposed to measure.

Problems of this definition.

Validity is a property of tests, not their score interpretations.
To be valid, a test score should measure some construct directly
Score validity is to some extent, a function of the test author’s or developers understanding of whatever the construct they intend to measure.

eg. Weschler tests are measuring what he conceptualised intelligence to be.

What are the 5 Types of validity?

Content
Concurrent
Predictive
Construct
External

what is face validity?

the superficial appearance of what the test measures from the perspective of the test taker/any naive observer.
not ocnsidered a true form of validity.
usually used to gather evidence that a particular test measures the construct it purports to measure by establishing high correls between it and with other existing instruments taht are meant to assess teh same construct.

what is an example of a test that is reliable but not valid.

to be valid, a test must measure what they’re designed to measure.
eg. a test that has items like ‘i like to push people around,’ and ‘some people find me threatening’ is not valid if it’s meant to be measuring sociability.

Describe content validity

it is the degree to which a test measures what it was originally designed or intended to measure
eg. IQ tests - does the IQ test actually measure IQ rather than something else such as motivation to succeed in school ??

What is concurrent validity?

Degree to which a test can serve as a substitute for another longer, or more costly test. (concerned with convenience)

eg. block design test is used to detect brain damage, and generates similar results to similar neurological tests that are more costly and dangerous.

what is predictive validity?

The degree to which a test accurately predicts what it was originally developed to predict. (concerned with predictive power)

eg. SAT scores are designed to predict grades and graduation from college.

What is construct validity?

The degree to which a test is meaningfully correlated with tests that measure similar constructs, and is unrelated to conceptually dissimilar or irrelevant measures. >does it test the theory that underpins the measure? usually constructs aren't measureable so we look for epiconstructs. eg. Eysenck predicted how diff personality trait scales should correlate. Bc his tests fit the predicted circumplex pattern, they demonstrated construct validity. If a test continually confirms specific patterns or associations derived from a theory, it has HIGH construct validity.

What is convergent validity

the notion that 'good' tests should correlate moderately with conceptually similar tests.

What is discriminant validity

'Good' tests should also note correlate with dissimilar tests.

What is external validity

Do research findings in the lab generalise to real world settings? Limitations: - Tests can be reliable without being valid - but they can never be valid without being reliable. - Tests have degrees of reliability and validity - reliability and validity are used to judge the acceptability of personality traits.

What usually is a large effect size in clinical phenomena?

0.8 - quite a bit of overlap.

How do we distinguish between false positives and false negatives in clinical decisions?

When making a decision about diagnosis, make sure that it is both CLINICAL and STATISTICALLY significant. since the overlap between populations is usually quite high.

What are the consequences of setting a cut off level too high?

- poor sensitivity - high specificity (when high scores are good and low are bad)

what are the consequences of setting a cut off level too low? eg. at the beginning of the overlap area

- high sensitivity - low specificity (when high scores are good and low are bad)

When to use a test to aid a decision?

- when all other things being equal, potential gains in accuracy of selection decisions are greatest when the base rates are close to 0.50 - base rates close to extremes (0.1 or 0.9) indicate that accurate selection is either VERY difficult or EASY under circumstances. Using test scores as the basis for selection even if they have high validity - may not increase accuracy of decision, may lower them.

What is the relationship between sensitivity, specificity and cut-off?

- Sensitivity and specificity for any given test are dependent on the CUT OFF SCORE.

What are some problems faced in psychometric tests?

- Social desirability - when faced with a psychometric test, many people feel they are being judged, and so alter their answers accordingly. reasons for social desirability: 1. self-deception - individuals may be overly optimistic in their perceptions of their own positive personality features, and play down their perceived -ve aspects. 2. impression management - trying to appear nice bc fear social disapproval - or malingering . eg. faking good or faking bad. - Mood - can affect personality tests - good mood = answer differently to when in bad mood. - enviro effects - eg. noise, heat, light - can impact mood and cog abilities. hIGH TEMPERATURE HAS NEGATIVE EFFECT ON VIGILANCE, ATTENTION, MEMORY & RT - cultural bias

How does cultural bias affect psychometric tests?

- Bias in tests against members of ethnic groups of the population - eg. newly arrived immigrants. - Most psychometric tests are based on western definitions. Difficult to have culture-free tests.

What are some reasons why culture-free tests are so hard to make?

- conceptions of intelligence varies widely from culture to culture - even if the content of a test can be made culture free, culture itself will still affect the results, directing attitudes towards tests, test taking and compeition etc

What are examples of relatively culture-free tests?

- they are all untimed and non-verbal. - leiter international performance scale - ravens progressive matrices