Item analysis Flashcards by Mampho Ledimo

What is item analysis?

A general term used to describe a set of methods used to evaluate test items.
- Item analysis helps us to decide what items to include in our measure.
- The basic methods include item difficulty and item discriminability.

How well did you know this?

Not at all

Perfectly

What are the different methods of conducting item analysis?

Item difficulty
Item discriminability
Item characteristic curves (ICCs)
Item Response Theory (IRT)
Criterion-referenced tests.

How well did you know this?

Not at all

Perfectly

What is item difficulty?

The proportion of people who get a particular item correct.
The higher the item difficulty value; the easier the item.
The formula is p= number of people who answered the item correctly/ number of people taking the measure.
It is also referred to as the facility index.
Item difficulty ranges between zero and 1.
Ideally we want p values that fall within the 0.3 to 0.7 range. Higher than 0.7 is too easy and lower than 0.3 is too difficult.

How well did you know this?

Not at all

Perfectly

What is facility and the facility index?

An item with good Item facility is one for which different respondents give different answers.
The facility index gives an indication of the extent to which respondents answer an item in the same way.

How well did you know this?

Not at all

Perfectly

What affects item difficulty?

The format of the test
The number of test items.
Item difficulty is more applicable in settings where there is a clear correct and incorrect answer.

How well did you know this?

Not at all

Perfectly

What is the optimum difficulty level (ODL)?

Between 0.30 and 0.70
Calculate optimum difficulty: (1-chance)/(2+chance). Essentially halfway between 100% getting the item correct and the level of success estimated by guessing.
The ODL for the dichotomous format does not fall within the 0.3-0.7 range.

How well did you know this?

Not at all

Perfectly

What are some key things to note about the ODL?

We want most of the items to be around the ODL and few at the extremes of this range.
The distribution of p-values should be approximately normal in MCQs.
We need a range to discriminate between stronger and struggling students.
The facility index of the item tells us nothing about its intrinsic characteristics. Its value is related to the sample. Different sample yield different results: item difficulty is sample dependent.

How well did you know this?

Not at all

Perfectly

What are the exceptions for having items be within the ODL range?

At times we need more difficult items (e.g. selection process)
At times we need more easier items (e.g. special education)
At times we need to consider other factors (e.g. boost morale)

How well did you know this?

Not at all

Perfectly

What is item discriminability?

Assessment of item discriminability determines whether the people who have done well on particular items have also done well on the whole test.
It can be assessed using different methods: The extreme group method and the point biserial method.

How well did you know this?

Not at all

Perfectly

What is the discrimination index?

Higher values indicate better discriminability.
Good item discriminability is when people who do well on the test overall get the item correct and vice versa.

How well did you know this?

Not at all

Perfectly

What is the extreme groups method?

This method compares those who have done well with those who have done poorly on a test
Calculated by looking at the number of people in the upper quartile who got the item correct divided by the number of people in the lower quartile who got the item correct; this is referred to as the discrimination index.
d(i)= U/N(u)- L/N(l)
0.4 is the baseline for item discriminability. If an item is lower than 0.4 then it doesn’t have good discriminability.

How well did you know this?

Not at all

Perfectly

What is the point-biserial method?

It is also known as item-total correlation.
Good items are those items for which students who pass the item do well on the overall test. And conversely, students who fail the items should do badly on the overall test.
If a student fails the item but does well on the overall test, the item-total correlation will be negative.
The rule here is also 0.4.
Item discriminability can be used for the Likert test.
The closer the number is to one, the better (same for extreme groups)

How well did you know this?

Not at all

Perfectly

What are the steps for calculating the point-biserial correlation?

Find the mean score and SD for all test takers.
Find the mean test score for those who got e.g. item 1 correct only.
Subtract this from the total mean and divide by the SD.
not relevant babes.

How well did you know this?

Not at all

Perfectly

What else do you need to know about the point-biserial correlation?

Item correlations can also be used for Likert-type test items, category format. Good items here would be those that have a positive item total correlation.
-E.g. If an item on a questionnaire measuring schizophrenia symptoms has a high correlation with total scores on the overall questionnaire, then the item is good at measuring schizophrenia symptoms
Can use this as an indicator of whether or not to include an item in a test/questionnaire. (include items with a higher correlation and exclude those with a lower one).

How well did you know this?

Not at all

Perfectly

What is an tiem characteristic curve?

The relationship between performance on an item and performance of the overall test tells us how well the item is tapping into what we want to measure.
They are a graphical display of item functioning.
The total test score is plotted on the x-axis.
The proportion of people getting the item correct is plotted on the y-axis.

How well did you know this?

Not at all

Perfectly

What are the steps for drawing ICCs?

Study These Flashcards

Define categories of test performance. Do this in a similar way to how you would define categories for frequency tables. Could be specific total scores or percentages.
Determine what proportion of people within each category got the item correct. This is done by working out how many people are in each category. How many people in each category passed the item divided by the number of people in that category.
Plot your ICC.
*look at powerpoint for different types of curves.

What is item response theory (IRT)?

Study These Flashcards

It is a different model of psychological testing which makes extensive use of item analysis.
A computer generates items, each item has a particular difficulty level, the computer gives you an item, if you answer correctly the next item will be of increased difficulty, if you answer incorrectly the next item will be of decreased difficulty.
This model looks at what you can do and only gives you what it thinks you can handle.
Essentially, the test is tailored to the individual.

How is test performance defined in IRT?

Study These Flashcards

It is defined by the level of difficulty of items answered correctly instead of total test score.
E.g. The person can answer most items correctly at the 0.3 (or 0.45 or 0.70 etc) level of difficulty. Rather than this person got 30% or 45% or 70% on this test.
This is done through adaptive computer-based testing.

What are the advantages of IRT?

Study These Flashcards

Tests based on IRT can easily be adapted for computer administration.
The tests are quicker.
The morale of the test taker is not broken down.
Reduces chances of cheating as everyone receives different questions.

What is the measurement precision of peaked conventional tests?

Study These Flashcards

It is best at testing individuals with average ability.
It does not assess high or low levels well.
It has a higher measurement precision for average ability levels; low precision at either end.

Measurement precision and rectangular conventional

Study These Flashcards

Equal number of items assessing all ability levels.
But: jack of all trades, master of none
Relatively low precision across the board.

Measurement precision and adaptive tests

Study These Flashcards

Test focuses on the range that challenges each test taker
Measurement precision is therefore high at every ability level.

What are criterion-referenced tests?

Study These Flashcards

Compares performance with some objectively defined criterion. E.g. the extent to which performance on the QLT predicts success at stats in psychology.
Tests are developed based on learning outcomes. Items are then written based on these outcomes.

How do we evaluate items in criterion-referenced tests?

Study These Flashcards

Get two groups and give one the learning unit and the other is not given the learning unit.
The collected scores are plotted on a graph.
The graph should be V or U shaped.
The bottom of the curve is called the antimode: the cutting score or decision point regarding who passed the criterion and who has not.

What are the limitations of criterion-referenced tests?

- Tells you that you got something wrong, but not why. - Emphasis is ranking students rather than identifying gaps in knowledge. ( this is true for item analysis in general) - It can encourage teaching to the test. -

Item analysis Flashcards

(25 cards)