Week 4 - Test Construction Flashcards
Rational-empirical approach
relies on both reasoning from what is known about psychological construct, and collecting and evaluating data about how the test and items actually behave when administered
Empirical approach
relies on collecting and evaluating data about how each of the items from a pool of items discriminates between groups who are through to show or not show the measured attribute
Steps (5)
- Test Conceptualisation
- Test Construction
- Test Tryout
- Item Analysis
- Test Revision
Steps (specific)
- Specify Attributes
- Check literature for existing test
- Choose measurement model
- Write and edit items
- Administer and analyse response
- Select ‘best’ items for test
- Check reliability and validity
- Norm
- Prepare test manual
- Publish test
Specification of the attribute
Attribute, construct, latent trait, test specification
Attribute
consistent set of behaviours, thoughts of feelings of a characteristic
Construct
a specific idea or concept about a psychological process or underlying trait
Latent Trait
involves the strong assumption that there is only one dimension underlying the attribute
Test specification
a written statement of the attribute or construct that the test constructer is seeking to measure and the conditions under which it will be used
Literature search
- See how others have approached the problem in the past
- Identify theories or other constructs that may be relevant
- Obtain a clear, theory-informed conceptualisation and definition of the target construct
Literature search questions
○ Do psychological traits and states exist
○ Can they be measured
○ Test behaviour is predictive
○ What are tests strengths/weaknesses/errors
○ Is it fair and will benefit society
Types of measure
- Nominal - classification or categorisation - categories themselves are not meaningful - no relationship between categories (includes yes/no questions)
- Ordinal - classification but in some sort of rank - not units of measurement, not evenly spaced
- Interval - equal intervals between each number
- Ratio - as per interval, but has a true zero
Models of measurement
formal statement of observations of objects mapped to numbers that represent relationship among the objects
Trace line
a graph of the probability of response to an item
Classical test theory, Item Response theory (CTT, IRT)
not actually sure what these mean?
Differential item functioning
possibility that a psychological test item will behave differently for different groups of respondents
Item writing and editing
- Items should be relevant and representative
- Plan for item writing - a plan of the number and type of items that are required for a test
- Over inclusion of items is recommended at this point
- 5th - 7th reading level
Item writing guidelines
- Use straight forward language
- Avoid double barrelled items
- Avoid slang and colloquial expressions that can quickly become obsolete
- Consider if using positive and negative words is a good idea
- Write items that majority can respond to appropriately
- Ask about sensitive issues using straightforward and non-judgemental language
Phrasing is consistent with response options
Likert Scale
Typically provides the test-taker with 5 or 7 possible responses along a continuum
Pros - degree of trait can be measured, lots of information, easy to use, works best with strong statements
Cons - odd vs even number of responses
Binary choice scale
two options, ie. True/false, yes/no
Pros - easy to construct and score, quick to administer, a lot of questions
Cons - Allows guessing, only suits dichotomous content, content not as rich
Paired comparisons
two options, on a basis of some rule, with each option assigned value (0 or 1)
Comparative scaling
sorting or ranking stimuli according to a rule
Written/essay formats
Pros - written communication, complex and imaginative, information generated not recognised
Cons - narrow content, bluffing possible, hiding behind good writing, scoring is time consuming, inter-rater reliability issues
Test try out
- Administer test on representative sample
- Use standardised instructions
- Data is then used to narrow down number of items
Item Analysis
Properties to investigate:
Item difficulty/distribution, Dimensionality (i.e. factor analysis), Item reliability, Item validity, Item discrimination
Item Validity
the extent to which the score on an item correlates with an external criterion relevant to the attribute
Performance of items
Item Difficulty Index
Performance on each individual item should differ (i.e. 100% correct is bad)
Item difficulty index = examinees who answered correctly/total number of examinees
- High index = low difficulty
- The probability of guessing correctly is taken into account when deciding the optimal item-difficulty index
Item distributions
- Consider removing items with skewed distribution - these are items most people will answer in the same way
- Keep items with high variance/distribution
- Keep items with a mean close to centre of range of possible scores
Dimensionality
Some items may not have a common underlying variable or they may have several underlying variables (go to factor analysis)
Factor analysis
- New scale development usually starts with exploratory factor analysis (EFA) to identify a manageable number of factors
- Confirmatory factor analysis (CFA) used when number of factors is known
- Determine the number of underlying latent variables or constructs
- Help condense information
- Define the content or meaning of the factors
- Helps identify items that are performing better or worse
Factor analysis decisions
- Number of factors to extract – Eigenvalues (> 1) – Scree Plot
- Rotation - oblique (factors are correlated) or orthogonal (factors are uncorrelated)
Item reliability
- a measure of internal consistency
Are the items homogeneous? - Correlation between the score for the test item and the scale score (item-scale correlations), inter-relatedness (Cronbach’s alpha)
Item-discrimination index
- Does the item separate high and low scorers
- Comparison of top and bottom performers on the test
- Commonly calculated using a point-biserial correlation
Test revision
Once test has been revised, it needs to be tried out and go through analysis again
Existing tests - tests can ‘age’ (interpretations, domains, stimuli change; word meanings, test norms, theories behind test) - may need to be reviewed
Cross-validation
Collection of additional criterion-related validity data
- Is the test applicable to this population?
Validity Shrinkage: Often lower validity the second time around ▪ Inevitable ▪ Generally a slight difference ▪ Eliminating chance results ▪ Near enough is good enough!
Norming
In a representative population
- General population vs specific population
Move on to creating a test manual/instructions and publication