Week 10: Measurement Error Flashcards
Measurement error
The difference between the observed score and the true score
List some sources of measurement error
- the test, if it is poorly designed
- the test taker and how they are feeling on the day
- the testing environment
- cultural factors e.g. language understanding
Standard error of measurement
Provides the necessary information for assessing the measurement error associated with an individuals test performance. How far the mean from a sample is from the true mean of the population
List in order the test development process
- conceptualisation
- construction
- tryout
- analysis
- revision
Conceptualisation
How do you define a test? What are you measuring? What is useful and what is outside the bounds of this?
Construction
Essentially developing the items, and considering why items are important for the test
Tryout
Collecting test data and qualitative feedback on the test
How is best to measure knowledge
Yes/no, multiple choice, short answer
How is best to measure degree of symptoms/attitudes
Likert scale
How is best to measure health
Visual analogue scale
How is best to measure neurological function?
- visual items
- memory
- visuospatial tasks
How is best to measure personality?
Q-sort
List some important considerations when designing a test
- generate an item pool
- short and concise items
- age appropriate level
- no double barrelled items
- include negatively worded items
Dichotomous
Two answers; Yes/no, true/false
Polytomous
More than 2 alternatives, like a multiple choice
Likert
Typically 5 items ranging from strongly disagree to strongly agree
Likert type format
“Never”, “rarely”, etc.
Q-sort
Cards with descriptions that need to be sorted via categories. Can be useful for clients but not for large samples.
Describe semantic differential
Where a person falls between two opposing poles by marking a line
Describe visual analogue
Indicating where you fall on a line
Benefits of reverse scored items
Bias is controlled for and requires extra concentration from participants
Category format
e.g. ‘On a scale from 1-10..’
Some characteristics of category format
- has situational effects
- need clearly defined endpoints
- increasing categories above 10 can decrease reliability
Classical test theory
- true score = observed score + error
- interested in total scores, of which one proportion is error and one proportion is true scores
Item response theory
More about individual items and focuses on looking for a characteristic curve.
The item characteristic curve
Describes the relationship between latent ability and performance on a test item
What will the item characteristic curve ideally show
Someone who has a high ability should have a higher chance of doing well than someone who has low ability
Describe criterion reference tests
Look at how well someone performs relative to others. Having a norm referenced test as the one we use most commonly.
List the various response item formats
- dichotomous
- polytomous
- MC
- likert
- likert type
- Q sort
- true false
- two choice
- forced choice
Describe item difficulty in terms of item analysis
The % of people who get an item correct informs the difficulty level, for example if 69% of individuals get it right, the difficulty level for that item is set at .69
A good test…
- discriminates at many levels
- has varying degree of difficulty
Extreme group method
The top 33% is compared to the bottom 33%, subtracting the proportion of items correct. The top proportion should have more items correct than the bottom.
Point biserial method
Correlation between performance on particular items and performance on whole test