1. Test Conceptualisation 2. Test Construction 3. Test Tryout 4. Item Analysis 5. Test Revision

1. Specify Attributes 2. Check literature for existing test 3. Choose measurement model 4. Write and edit items 5. Administer and analyse response 6. Select 'best' items for test 7. Check reliability and validity 8. Norm 9. Prepare test manual 10. Publish test

- See how others have approached the problem in the past - Identify theories or other constructs that may be relevant - Obtain a clear, theory-informed conceptualisation and definition of the target construct

- Nominal - classification or categorisation - categories themselves are not meaningful - no relationship between categories (includes yes/no questions) - Ordinal - classification but in some sort of rank - not units of measurement, not evenly spaced - Interval - equal intervals between each number - Ratio - as per interval, but has a true zero

Week 4 - Test Construction Flashcards by Jasmine Dimmock

Rational-empirical approach

relies on both reasoning from what is known about psychological construct, and collecting and evaluating data about how the test and items actually behave when administered

How well did you know this?

Not at all

Perfectly

Empirical approach

relies on collecting and evaluating data about how each of the items from a pool of items discriminates between groups who are through to show or not show the measured attribute

How well did you know this?

Not at all

Perfectly

Steps (5)

Test Conceptualisation
Test Construction
Test Tryout
Item Analysis
Test Revision

How well did you know this?

Not at all

Perfectly

Steps (specific)

Specify Attributes
Check literature for existing test
Choose measurement model
Write and edit items
Administer and analyse response
Select ‘best’ items for test
Check reliability and validity
Norm
Prepare test manual
Publish test

How well did you know this?

Not at all

Perfectly

Specification of the attribute

Attribute, construct, latent trait, test specification

How well did you know this?

Not at all

Perfectly

Attribute

consistent set of behaviours, thoughts of feelings of a characteristic

How well did you know this?

Not at all

Perfectly

Construct

a specific idea or concept about a psychological process or underlying trait

How well did you know this?

Not at all

Perfectly

Latent Trait

involves the strong assumption that there is only one dimension underlying the attribute

How well did you know this?

Not at all

Perfectly

Test specification

a written statement of the attribute or construct that the test constructer is seeking to measure and the conditions under which it will be used

How well did you know this?

Not at all

Perfectly

Literature search

See how others have approached the problem in the past
Identify theories or other constructs that may be relevant
Obtain a clear, theory-informed conceptualisation and definition of the target construct

How well did you know this?

Not at all

Perfectly

Literature search questions

○ Do psychological traits and states exist
○ Can they be measured
○ Test behaviour is predictive
○ What are tests strengths/weaknesses/errors
○ Is it fair and will benefit society

How well did you know this?

Not at all

Perfectly

Types of measure

Nominal - classification or categorisation - categories themselves are not meaningful - no relationship between categories (includes yes/no questions)
Ordinal - classification but in some sort of rank - not units of measurement, not evenly spaced
Interval - equal intervals between each number
Ratio - as per interval, but has a true zero

How well did you know this?

Not at all

Perfectly

Models of measurement

formal statement of observations of objects mapped to numbers that represent relationship among the objects

How well did you know this?

Not at all

Perfectly

Trace line

a graph of the probability of response to an item

How well did you know this?

Not at all

Perfectly

Classical test theory, Item Response theory (CTT, IRT)

not actually sure what these mean?

How well did you know this?

Not at all

Perfectly

Differential item functioning

possibility that a psychological test item will behave differently for different groups of respondents

Item writing and editing

Items should be relevant and representative
Plan for item writing - a plan of the number and type of items that are required for a test
Over inclusion of items is recommended at this point
5th - 7th reading level

Item writing guidelines

Use straight forward language
Avoid double barrelled items
Avoid slang and colloquial expressions that can quickly become obsolete
Consider if using positive and negative words is a good idea
Write items that majority can respond to appropriately
Ask about sensitive issues using straightforward and non-judgemental language
Phrasing is consistent with response options

Likert Scale

Typically provides the test-taker with 5 or 7 possible responses along a continuum

Pros - degree of trait can be measured, lots of information, easy to use, works best with strong statements
Cons - odd vs even number of responses

Binary choice scale

two options, ie. True/false, yes/no

Pros - easy to construct and score, quick to administer, a lot of questions
Cons - Allows guessing, only suits dichotomous content, content not as rich

Paired comparisons

two options, on a basis of some rule, with each option assigned value (0 or 1)

Comparative scaling

sorting or ranking stimuli according to a rule

Written/essay formats

Pros - written communication, complex and imaginative, information generated not recognised

Cons - narrow content, bluffing possible, hiding behind good writing, scoring is time consuming, inter-rater reliability issues

Test try out

Administer test on representative sample
Use standardised instructions
Data is then used to narrow down number of items

Item Analysis

Properties to investigate: Item difficulty/distribution, Dimensionality (i.e. factor analysis), Item reliability, Item validity, Item discrimination

Item Validity

the extent to which the score on an item correlates with an external criterion relevant to the attribute

Performance of items Item Difficulty Index

Performance on each individual item should differ (i.e. 100% correct is bad) Item difficulty index = examinees who answered correctly/total number of examinees - High index = low difficulty - The probability of guessing correctly is taken into account when deciding the optimal item-difficulty index

Item distributions

- Consider removing items with skewed distribution - these are items most people will answer in the same way - Keep items with high variance/distribution - Keep items with a mean close to centre of range of possible scores

Dimensionality

Some items may not have a common underlying variable or they may have several underlying variables (go to factor analysis)

Factor analysis

- New scale development usually starts with exploratory factor analysis (EFA) to identify a manageable number of factors - Confirmatory factor analysis (CFA) used when number of factors is known - Determine the number of underlying latent variables or constructs - Help condense information - Define the content or meaning of the factors - Helps identify items that are performing better or worse

Factor analysis decisions

- Number of factors to extract – Eigenvalues (> 1) – Scree Plot - Rotation - oblique (factors are correlated) or orthogonal (factors are uncorrelated)

Item reliability

- a measure of internal consistency Are the items homogeneous? - Correlation between the score for the test item and the scale score (item-scale correlations), inter-relatedness (Cronbach's alpha)

Item-discrimination index

- Does the item separate high and low scorers - Comparison of top and bottom performers on the test - Commonly calculated using a point-biserial correlation

Test revision

Once test has been revised, it needs to be tried out and go through analysis again Existing tests - tests can 'age' (interpretations, domains, stimuli change; word meanings, test norms, theories behind test) - may need to be reviewed

Cross-validation

Collection of additional criterion-related validity data - Is the test applicable to this population? Validity Shrinkage: Often lower validity the second time around ▪ Inevitable ▪ Generally a slight difference ▪ Eliminating chance results ▪ Near enough is good enough!

Norming

In a representative population - General population vs specific population Move on to creating a test manual/instructions and publication