writing and evaluating test items Flashcards

1
Q

how do you choose format of items?

A

Choice of format comes from objectives and purpose of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Item writing guidelines

A
  1. define clearly what you want to measure
  2. generate and item pool
  3. avoid long items (tedious to read)
  4. keep reading difficulty appropriate (education level)
  5. use clear and concise wording (avoid-double barreled and double negatives)
  6. mix positively and negatively worded items in the same test
  7. make sure items are culturally neutral as possible
  8. make content relative to the purpose
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How to write MCQ items

A

vary position of correct answer

all distractors must be plausible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True/false Qs

A

Both statements same length

Equal numbers of both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

5 types of item format

A
  1. Dichotomous format
  2. Polytomous Format
  3. Likert format
  4. category format
  5. checklists and q sorts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Dichotomous format

A
  • 2 alternatives
  • True/False
  • Yes/No
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Dichotomous format advantages

A
  1. ease of administration
  2. quick scoring
  3. requires absolute judgement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Dichotomous format disadvantages

A
  1. less reliable (50% of getting an item correct, less range of scores when it comes to analyses)
  2. encourages memorisation
  3. often truth is not black/white
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Polytomous format

A

more than 2 alternatives

MCQ questions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Polytomous format- distractors

A
  • incorrect alternatives
  • ideal to have 3-4 distractors to retain pscyhometric properties
  • must be as plausible as the correct answer
  • no cute distractors
  • make the test more reliable
  • but difficult to find good distractors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Polytomous format- advantages

A
  • easy to administer and score
  • requires absolute judgement
  • less likely to guess correctly than a dichotomous test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Correction for guessing

A

R – W/n – 1
Number of right answers minus the number of wrong answers divided by the number of choices for each item (minus 1)
R = number of correct
W = number of wrong
N = number of alternatives
Omitted answers are excluded in this calculation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Likert Format

A

Named after Likert, who 1st used it for attitude scale

  • idicates degree of agreement
  • 6-point scale (or even number of options) used to avoid the neutral response
  • Reverse score negatively-worded items
  • use statments
  • popular for attitude and personality scales
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Category Format

A

On a scale of 1 to 10…

Research suggests 7 best

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Category Format- disadvantages

A
  1. Tendency to spread responses across all categories
  2. Susceptible to the groupings of things being rated (context)
  3. Element of randomness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When is Category Format used?

A
  1. People are highly involved with a subject
    E.g., asking people in townships to rate service delivery
    More motivated to make a finer discrimination
  2. Want to measure the amount of something
    E.g., road rage experienced in a given situation
  3. Make sure your endpoints are clearly defined

Visual analogue scale

17
Q

Checklists

A
  • Common in personality measures

- A list of adjectives, check which ones describe you best

18
Q

Q-sorts

A
  • place statements into piles
  • piles indicate degree to which you think a statement describes a person/yourself
  • category format implicit here
19
Q

Item analysis

A

Item analysis is a general term used to describe a set of methods used to evaluate test items. Item difficulty and item discriminability are the most basic of these methods.

20
Q

Item Difficulty

A
  • the proportion of people who get a particular item correct
  • higher value= easier the item
  • p = number answered item correctly / number taking the measure
21
Q

Optimum difficulty level (ODL)

A

-between 0.30 and 0.70

Example: MCQ test with 4 alternatives
-4 answer options, therefore chance = 0.25
-Halfway between 100% and chance: (1.00 - 0.25)/2 = 0.375
-Add chance: 0.375 + 0.25 = 0.625
(Add chance because we require a difficulty level of at least chance)
-ODL = 0.625

22
Q

exceptions to optimum difficulty level

A
  1. At times we need more difficult items e.g., selection process
  2. At times we need more easier items e.g., special education
  3. At times we need to consider other factors e.g., boost morale
23
Q

Item discriminability

A

Have those who did well on particular items also done well on the overall test?

24
Q

Good item discriminability when:

A

People who do well on test overall get the item correct (and vice versa)

25
Q

Discrimination Index (di)

A

Higher values indicate better discriminability

26
Q

Item discriminability- extreme groups method

A

-Calculated by looking at the number of people in the upper quartile who got the item correct divided by the number of people in the lower quartile who got the item correct
-Essentially subtracting item difficulty between top and bottom 25%
di = U/Nu – L/NL

27
Q

Item discriminability: The point-biserial method

A

-Also known as item-total correlation
- Item correlations can also be used for Likert-type items, category format items, etc.
Again, good items should be those that have a positive item-total correlation

For example:
If an item on a questionnaire measuring schizophrenia symptoms has a high correlation with total scores on the overall questionnaire, then the item is good at measuring schizophrenia symptoms

Could use this correlation of an indicator to include or exclude from test/questionnaire in future
Include higher and exclude lower

28
Q

Item characteristic curves (ICCs)

A

The relationship between performance on an item and performance on the overall tests tells us how well the item is tapping into what we want to measure.

-A graphical display of item functioning
Total test score plotted on X-axis
Proportion (i.e., 0.23, 0.50, etc.) getting the item correct plotted on Y-axis

-need discrete categories for scores.

29
Q

Item response theory (irt)

A

A different model of psychological testing
-Makes extensive use of item analysis
-Computer generates items
Each of these items has a particular difficulty level
-Computer gives you an item
-If you answer it correctly, the next item will be of increased difficulty, if incorrectly, the next item will be of decreased difficulty
-Looks at what you can do and only gives you what it thinks you can handle
-Essentially, the test is ‘tailored’ to the individual

Example:
This person can answer most items correctly at the 0.30 (or 0.45 or 0.70, etc.) level of difficulty…
Rather than: This person got 30% or 45% or 70% on this test.

30
Q

Test performance in irt- advantages

A
  • Tests based on IRT can easily be adapted for computer administration
  • Quicker tests
  • Morale of test-taker is not broken down
  • Reduces chances of cheating
31
Q

Measurement precision: peaked conventional

A
  • tests individuals at average ability best
  • doesn’t assess high or low levels well
  • high precision for average ability levels, low precision at either end
32
Q

Measurement precision: rectangular conventional

A
  • equal number of items assessing all ability levels

- relatively low precision across the board

33
Q

Measurement precision: adaptive

A
  • tests focuses on the range that challenges each individual test taker
  • precision therefore high at every ability level
34
Q

Criterion-referenced tests

A

-Compares performance with some objectively defined criterion
E.g., the extent to which performance on the QLT predicts success at stats in psychology

Develop tests based on learning outcomes
What is it that the student should be able to do?
E.g., At the end of this lecture you should be able to:
Describe an ICC
Calculate item discriminability
Calculate a point-biserial correlation

35
Q

Evaluating items in Criterion-referenced tests

A
  • 2 groups: 1 given the learning ‘unit’. 1 not given the learning ‘unit’
  • collect scores; plot on graph and should form a V or U shape

-bottom curve is the anitmode

36
Q

Limitations of Criterion-referenced tests

A

Tells you that you got something wrong, but not why

Emphasis on ranking students rather than identifying gaps in knowledge

‘Teaching to the test’

37
Q

how does IRT differ to traditional testing methods?

A

it is defined by the level of difficulty of items answered correctly
Instead of total test score as a traditional method does