Lecture 6: Essential Test Item Consideration Flashcards

1
Q

Test items

A

the units that make up a test and the means through which samples of test taker behaviors are gathered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Item Analysis

A

term that refers to all techniques used to assess the characteristics of test items and evaluate their quality during the process of test development and test construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Qualitative item analysis

A

rely on judegments of reviewers concerning the substantive and stylistic charcateristics of items as well as their accuracy and fairness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Qualitative criteria (how do you evaluate criteria qulitatively?)

A
  • appropriateness of item content and format
  • clarity of expression
  • grammatical correctness
  • adherence to “some basic rules for writing items that have evolved over time”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Quantitative item analysis

A

involves a variety of statistical procedures designed to ascertain the psychometric characteristics of items based in the responses obtained from the samples used in the process of test development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bias

A

any systematic error that enters into scores and affects their meaning in relation to what scores are designed to measure or predict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The context of item analysis

A
  • usage of simple statistical procedures
  • information on item behavior
  • practical features of interest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is the decision to create a test developed?

A

when developer realized that either no test exists for a particular purpose or that the existing tests for certain purpose are not adequate for a reason

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Planning a test entails specifying:

A
  • the contruct or knowledge domains that the test will asess
  • the typoe of population with which the test will be used
  • the objectives of the items to be developed
  • the concrete means through which teh behavior samples will begathered and scored
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Steps in the Test development Process

A
  1. Item generation
  2. Qualitative item analysis
  3. Revision/replacement of items
  4. Pilot study
  5. Evaluation of pilot study results
  6. Potentil modification of items
  7. Additional pilot studies
  8. Determination and fixation of test length
  9. Test norming
  10. Test publication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Item generation

A

by writing (or otherwise) creating the test items, the administration and scoring procedures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Qualitative item analysis

A
  • submitting the pool item to experts
  • to identify items that may be a disadvantage, or be offensive to any particular demographic group for which the test is intended
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Revision/Replacement of items

A

items that are identified by the reviwers as inadequate or problematic from the point of view of subject matter, offensiveness, or bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pilot study

A

tryout out items that have been gathered and reviewed on samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Evaluation of pilot study results

A

through quantitative item analysis and additional qualitative analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Potential modification of items

A

adding, deleting or modifying test items as needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Additional pilot studies

A

cross-validation: for checking wthether item statistics remain stable across different groups until a satisfactory set of items is obtained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Determination and fixing of test length

A
  • and the sequencing of items
  • administration and scoring procedures that will standard in the final from of the test - on the basis of the foregoing analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Test norming

A

administering the test to a new sample of individuals in order to develop normative data or performance criteria, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Test publication

A

Publishing the final form, along with administration and scoring manual
(accompanying documentation of the use for which the test is intended)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

computer adaptive testing (CAT)

A

relies on banking pools of items that have been carefully calibrated with respoect to the information they convey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Types of responses they require from test takers (two types)

A
  1. selected-response items

2. constructed-response items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Selected-response items

A
  • objective or fixed-response items
  • present a limited number of response alternatives (from which test takers must choose)
  • Pass/fail items (dichotomous), ranking scale items (polytomous)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Forced choice

A

objective items that require test takers to chosse between two or more alternatives is most or least characteristic of themselves
- mainly used in multidemensional personality inventories

25
Ipsative scores
ordinal numbers that simply reflect test taker's raking of constructs assesses by the scaled within a forced-choice format test
26
Advantages of Selected-Response items
- easy to administer, easy to score - high degree of objectivity - enhance test score reliability - group administration
27
Disadvantage of Selected-Response items
- restrict the potential responses to a selection chosen by test developers - susceptible to guessing (pass/fail) - response styles (tendiency towards middle/extremes) - responses can be disorted by social desirability, self-monitoring, etc.
28
Constructed- response items
- "Free" response format - open-ended (variety is limitless) - but usually some constarints on response behavior in instructions (time limit, length of response, usage of materials, etc.)
29
In personality testing
the use of constructed responses is limited mainly to projective techniques also know as performance-based measures of personality - respond as freely as possible and reveal aspect of personality
30
Advantages of Constructed-response items
- do not restrict response behavior to pre-selected options - may elicit greater acceptance of a test - provide richer samples of the behavior of examinees (unique charcateristic can emerge)
31
Disadvantages of Contructed-response items
- less objective scoring (might be evaluated different by different scorers) - highly diverse responses, questioning their comparability - test lenght: responses require more time for administration and scoring
32
What is the most important aspect in qualitative item analysis?
- item validity: does a specific item carry its own weight within a test by eliciting information that advances the purpose of the test? (also item discrimination)
33
Classical methods of item analysis
1. Item difficulty | 2. Item Discrimination (item validity)
34
Qualitative evaluation (inspect content regarding...)
- approprateness - difficulty level - possible bias or offensiveness toward any group
35
Quantitative evaluation
of item diffulty is carried out through statistcis that assess wtather items perform the way they were intended to perform
36
Item difficulty
- diffculty level of a test as a whole is a function of the difficulty levels of the indivudal items that make up a test (easy items = easy test) - item diffculty is sample dependent (depends on the ability of test takers)
37
proportion (or percentage) passing (p)
the higher the prcentage passing, the easier the item is | - when normally distributed, p values can be trasnformed into z-values
38
Z- values
relative difficulty levels of items can be compared across various groups by administering a commong set of item (anchor items)
39
How can item diffulty be gauged?
- difficulty in words in the frequency with which they are used in the language - quantitaive indexes: percentage of test takers who aswer an item correctly (p-value)
40
Absolute scaling
allows for the difficulty items to be placed on a uniform numerical scale for samples of test takers at different ability levels
41
Item difficulty levels, test difficulty levels, and test purpose
- the average score on a test is the same as the average diffuclty on its items - average percenatge passing (p) for the items in a test id 80%, the average score on the test will be 80% as well
42
insufficient ceiling
when test items are too easy for a certain group | - the distribution will be negatively skewed
43
inadequate floor
when the test items are too difficult for a certain group | - the score distribution is positively skewed
44
Distractors
- have a great deal of influence on item difficulty - the number of sitractors directly affects indexes of item difficulty because the probability of guessing correctly is higher when the number pf choice is smaller
45
Item validity - Item discrimination
the extent to which items elicit responses that accurately differentiate test takers in terms of behavior, knowledge, or other charcateristics that a test is designed to evaluate
46
Item validation criteria
- Internal criteria - total test score is used to validate item (homogeneity of the test increases) - > the reliability indexes based on interitem consistency is enhanced - external criteria- test are used in valdating items - > the validity of scores on the test as a whole is enhanced
47
unidimensional traits
total score may be used to validta items | - all test items should correlate highly with the total score and each other
48
Complex and multifaceted constructs
- items are validated against external criteria that are also mor global - not necessarily have to correlate highly with one another (not homogenous?)
49
Item Validity statistics
all statistical procedures used to gauge the degree to which items discriminate in term of criterion require info about... 1. item performance 2. criterion stading for individuals in the samples from which the item discrimination statistics are extracted
50
Index discrimination (D)
the differences in the percenatge or proportion of test takers in the upper and lower criterion groups who pass a given item or answer in the keyed direction - positive discrimination indexes: more individuals in the upper criterion group
51
Computation of D
test takers must be classified into distinct criterion groups based either on their total scores on the test or on some external indicator of their standing on the constructs assessed - once the groups are created, the percentage (p) of individuals within each group who üasses the item is calculated
52
Correlation coefficients
- test theory method used for expressing the relationship between performance on an item and criterion
53
Point biserial (rpb) correlation coefficient
when item scores are dichotomous (pass/fail) and the criterion measure is continuous
54
phi coefficient
- when item scores and the critetion measure are both dichotomous - both can range from -1 to +1
55
Test can be classified into three categories
1. Pure speed tests 2. Pure power tests 3. Blend of speed and power
56
Pure speed test
difficulty is manipulated mainly through timing, limits are so short that most test takers connot complete all the items - when test takers fisnih all items - actual capacity has not been determined
57
Pure power test
- have no time limits | - difficulty is mainly manipulated by increasing or decreasing the complexity of items
58
Blend of speed and power
most ability tests - fall between the extremes of pure-speed and pure-power continuum (time limits allow test takers to attempt all items)
59
Item-test regression
necessary to calculate the proportion of individuals at each total score who passed a given item (combine info about item difficulty and item discrimination)