Psychometrics Flashcards
What determines choice of format in item writing? (2 marks)
Objectives and purposes of test (eg do we want to measure extent/amount of interaction, or quality of interaction)
Difference between objective and purpose of a study?
Purpose - broad goal of research
Objective - how are we practically going to achieve that
List 4 of the 9 item writing guidelines
- clearly define what you want to measure
- generate an item pool (best items are selected after analysis)
- Avoid long items
- Keep the reading difficulty appropriate
- use clear and concise wording (avoid double-barrelled items and double negatives)
- Use both pos & neg worded items
- use culturally neutral items
- (for MCQS) - make all distractors plausible & vary position of correct answer
- (for true/false Qs) - equal numbers of both and make both statements the same lenth
List the 5 categories of item formats
- Dichotomous
- Polytomous
- The Likert format
- The Category format
- Checklists and Q-sorts
Advantage of the dichotomous format (3 marks)
- easy to administer
- quick to score
- requires absolute judgement
Disadvantages of the the dichotomous format (3 marks)
- less reliable (50% chance of correct answer)
- encourages memorization instead of understanding
- often the truth is not black and white (true false is an oversimplification)
Minimum number of options for a polytomous format?
3 (but 4 is commonly used, and considered more reliable)
3 guidelines in writing distractors in the polytomous format
- distractors must be clearly written
- distractors must be plausible correct answer
- avoid “cute” distractors
Advantages of polytomous questions (4 marks)
- easy to administer
- easy to score
- requires absolute judgement
- more reliable than dichotomous (less chance of guessing correctly)
Formula for correcting guessing
R-(W/(n-1))
Fields in which Likert scales are predominantly used (2 marks)
Attitude and Personality questionnaires
How can one avoid a the neutral response bias in Likert Scales
have an even number of options
How does one score negatively worded items from a Likert scale
Reverse score
Suggested best no. of options in a category format question?
7
Disadvantages of the category format (2 marks)
- tendency to spread answers across all categories
- susceptible to the groupings of things being rated (rate an item lower if those other items in the category are really good - i.e not objective)
When best to use category format questions? (2 marks)
- when people are highly involved in a subject (more motivated to make a finer discrimination)
- when you want to measure the amount of something (eg levels of road rage)
Two tips when using the category format
- make sure your endpoints are clearly defined
- use a visual analogue scale (ideal with kids, for e.g smily face on one side of scale and frowny face on the other to describe how they’re feeling)
Where are Checklists format questions commonly found?
Personality measures (e.g a list of adjectives, tick those that describe you)
Describe the process of Q-sort format questions
Place statements into piles, piles indicate the degree to which you think a statement describes a person/yourself
In terms of Item analysis, describe item difficulty and give another name for it
The proportion of people who get a particular item correct (higher value = easier item)
AKA facility index
p = no of correct answers/no of participants
Ideal range for optimum difficulty level
0,3 - 0,7
How to calculate ODL (optimum difficulty level) for an item
Halfway between 100% the chance of guessing the answer correctly (1+chance)/2
E.g: For a item with 4 options, ODL = (1+0,25)/2 = 0.625
How should difficulty levels range across items in a questionnaire
You want most items around the ODL and a few at the extremes. The distribution of p-values (difficulty levels) should be approximately normal
Why does one need a range of item difficulty levels?
To discriminate between ability of test-takers
List 3 exceptions to having optimum difficulty levels
- need for difficult items (e.g selection process)
- need easier items (e.g special education)
- need to consider other factors (e.g boost confidence/morale at start of test)
p (an item difficulty level) tells us nothing about…
…the intrinsic characteristics of an item. It’s value is related to a given sample
Item discriminability is good when…
people who did well on the test overall get the item correct (and vice versa)
Describe the extreme groups method when calculating item discriminability
calculated by looking at proportion of people in the upper quartile who got the item correct minus the proportion of people in the lower quartile who got the item correct
{in other words, the difference in item difficulty when comparing the top and bottom 25%}
Di = U/Nu-L/Nl
*Should be a positive number of item has good discriminability
A red flag in item discriminability?
A negative number
Describe the point biserial method when calculating item discriminability
Calculate an item-total correlation
(if test-taker fails the item but does well on the overall test, i-tc will be negative)
Can item-total correlations be used for likert-type scales and other formats such as category to polymous formats?
yes
Results from item-total correlations can be used to decide….
which items to remove from the questionnaire
Item characteristic curves (ICCS) are visual depictions of…
the relationship between performance on an item and performance on the overall test
Give the x- and y-axes of an ICC
x-axis = total score on test
y-axis = proportion {of test takers who got the item} correct
3 steps to drawing ICCs
- Define categories of test performance (eg specific total scores/percentages)
- Determine what proportion of people w/in each category got the item correct
- Plot your ICC
Briefly explain Item Response Theory (IRT)
Test difficulty is tailored to the individual - wrong answer = decrease difficulty, right answer = increase difficulty. Test performance is defined by the level of difficulty of items answered correctly
Name the program through which Item Response Theory is often administered
The Adaptive Computer-based test (ACT)
Advantages of Item Response Theory (3 marks)
- increase morale
- quicker tests
- decrease chance of cheating
In terms of measurement precision, name the three types of tests
- Peaked conventional
- Rectangular conventional
- Adaptive
Described Peaked Conventional tests (3 points)
- Test individuals at average ability.
- Doesn’t assess high or low levels well
- high precision for average ability levels, low precision at either end
Describe Rectangular Convention tests (2 points)
- equal number of items assessing all ability levels
- relatively low precision across the board
Describe Adaptive Conventional tests
- test focuses on the range that challenges each individual test-taker
- precision is high at every ability level
Describe criterion-referenced tests
The test is developed based on learning outcomes - compares performance with some objectively defined criterion (What should the test-taker be able to do?)
How does one evaluate items in criterion-referenced tests? And how should the score/frequency graph look
2 Groups - one given the learning unit and one not given the learning unit. Graph should look like a V
List 3 limitations of criterion-referenced tests
- tell you you got something wrong, but not why
- Emphasis on ranking students rather than identifying gaps in knowledge
- Teaching to the test - not to education
What is referred to as the “test blueprint”
The test specifications
List 4 of the 7 things that test specifications should describe
1 Test (response) format
2 Item format
3 Total number of test items (test length)
4 Content areas of the construct(s) tested
5 Whether items or prompts will contain visual stimuli
6 How test scores will be interpreted
7 Time limits
In terms of response format, list 3 ways in which participants can demonstrate their skills
- Selected response (eg Likert scale/MCQ/dichotomous)
- Constructed response (eg essay/fill-in-the-blank)
- Performance response (eg block design task)
In terms of response format, give an example of objective vs subjective formats
Obj - MCQ or Likert
Subj - Essays, projective tests
List 5 types of item response format
- Open-ended - eg open ended essay q (no limitations on the test taker)
- Forced-choice items - MCQS, true/false qs.
- Ipsative forced choice (leads the test-taker into a certain direction, but still somewhat open. e.g I find work from home….)
- Sentence completion
- Performance based items
List the two determinants of test length
- Amount of administration time available
- Purpose of the measure (eg screening vs comprehensive)
When test length increases compliance ….. because people get ….. and …..
decreases; fatigued and bored
How many more items should be in the initial version of the test than the final one?
50%
Having good ….. ensures that all domains of a construct is tested
Content areas
….. refers to the ways in which knowledge or symptoms are demonstrated (and these are therefore tested for)
manifestations
Reliability is the desired ….. or ….. of test scores and tells us about the amount of ….. in a measurement tool
consistency or reproducibility; error
Why is test-retest not always a good measure of reliability?
Participants learn skills from the first administration of the test
Normally we perform roughly around our true score, and so our scores are…..distributed
normally
……is something we can use to increase reliability
internal consistency