- Item selection - Test administration - Test scoring - Systematic measurement error

a way of thinking that sees the test as a representative sample of a large domain of possible items that could be included on the test - considers the problem of using only a sample of items to represent a construct - as test gets longer, should represent construct better, increase reliability

Week 3 - Reliability and Validity Flashcards by Jasmine Dimmock

Classical Test Theory

Test scores are result of

Factors that contribute to consistency
Factors that contribute to inconsistency (characteristics of test takers, things that have nothing to do with attribute such as situation, environment)

How well did you know this?

Not at all

Perfectly

X = T + e

X = obtained score
T = true score
e = errors of measurement

How well did you know this?

Not at all

Perfectly

Sources of Error

Item selection
Test administration
Test scoring
Systematic measurement error

How well did you know this?

Not at all

Perfectly

Domain-sampling mode

a way of thinking that sees the test as a representative sample of a large domain of possible items that could be included on the test

considers the problem of using only a sample of items to represent a construct
as test gets longer, should represent construct better, increase reliability

How well did you know this?

Not at all

Perfectly

Inter-rater reliability

the extent to which different raters agree in their assessments

How well did you know this?

Not at all

Perfectly

Method variance

the variability among scores that arises because of the form as distinct from the content of the test - the method of administering the test

How well did you know this?

Not at all

Perfectly

Reliability

the consistency that a test will give the same result each time it is used to measure the same thing

How well did you know this?

Not at all

Perfectly

Stability over time

the extent to which test scores remain stable when a test is administered on more than one occasion

How well did you know this?

Not at all

Perfectly

Internal consistency

the extent to which a psychological test is homogeneous or heterogeneous

How well did you know this?

Not at all

Perfectly

Social desirability bias

a form of method variance that arises when people respond to questions that place them in a favourable or unfavourable light

How well did you know this?

Not at all

Perfectly

Test-Retest Stability

The same test administered to the same group twice at different points
- may not get identical scores due to practice effects, maturation, treatment effects or setting

How well did you know this?

Not at all

Perfectly

Parallel or alternate forms of reliability

Two forms of same test developed, different items selected according to the same rules

Parallel - same distribution of scores (mean and variance equal)
Alternate - different distribution of scores (mean and variance may not be equal)

Both matched for content and difficulty

How well did you know this?

Not at all

Perfectly

Split half method

Test is divided into halves that are compared

- useful in overcoming logistical difficulties of test-retest reliability

How well did you know this?

Not at all

Perfectly

Measuring Internal consistency

Cronbach’s Alpha

Cronbach’s alpha - a generalised reliability coefficient for scoring systems that are graded (i.e agree - disagree)

Acceptable levels of reliability

.70-.80 acceptable or good
greater than .91 may indicate redundancy

How well did you know this?

Not at all

Perfectly

Standard error of measurement (SEM)

allows estimation of precision of an individual test score
- the larger the SEM, the less certain we are that the test score represents the true score

Reliability coefficient (r) - an index of the ratio of true score to error score variance in a test
- SEM = (1-r)

How well did you know this?

Not at all

Perfectly

Validity

Study These Flashcards

the extent to which a test measures the construct it is intended to measure
- inferences from test must be appropriate, meaningful, and useful

Face validity

Study These Flashcards

does the test look like it measures the relevant construct

Content validity

Study These Flashcards

the extend to which items on a test represent the universe of behaviour the test was designed to measure

logical deduction rather than strict analysis

Construct underrepresentation

Study These Flashcards

failure to capture important components of a construct

Construct-irrelevant variance

Study These Flashcards

measuring things other than the construct of interest

Criterion related validity

Study These Flashcards

the extent to which a measure is related to an outcome
Good criterions share - reliable, appropriate

relationship between test and criterion usually expressed as correlation

Predictive evidence

criterion related validity

Study These Flashcards

how well the test predicts performance on a criterion

Concurrent evidence

criterion related validity

Study These Flashcards

refers to a comparison between the measure in question and an outcome assessed at the same time

Incremental validity

Study These Flashcards

the extent to which knowledge of a score adds to that obtained by a pre-existing test score or psychological characteristic

Construct validity

concerned with establishing how well a test measures a psychological construct

``` Convergent evidence (construct validity) ```

refers to the degree that two constructs which should be related, are related - identify relationships we would expect if the test is actually measuring the construct

Discriminant (divergent) evidence | construct validity

aims to demonstrate that the test is unique | - low correlations should be observed with constructs that are unrelated to what the test is trying to measure

``` Factor analysis (construct validity) ```

- to observe patterns - items may cluster, which is attributed to action of latent/unobserved variables or factors - Exploratory factor analysis - Confirmatory factor analysis

Decision-theoretic approach to predictive validity

includes - cutting point - valid positive and negative decisions - false positive and negative decisions - base rate - selection ratio

Cutting point

the test score or point on the scale that is used to split those being assessed into two groups predicted to show and not show the studied behaviour

Valid positive and negative decisions

Positive - where the person is predicted to show the behaviour and shows it Negative - where the person is not predicted to show the behaviour and does show it

False positive and negative decisions

Positive - the prediction is that the person has the characteristic but does not Negative - the prediction is that the person does not have the characteristic but does

Base rate

the proportion of individuals in the population show show the behaviour of interest

Selection ratio

the proportion of those assessed who can be allocated to the category of showing the behaviour

Week 3 - Reliability and Validity Flashcards

(34 cards)