Lecture 6: Essential Test Item Consideration Flashcards

1
Q

Test items

A

the units that make up a test and the means through which samples of test taker behaviors are gathered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Item Analysis

A

term that refers to all techniques used to assess the characteristics of test items and evaluate their quality during the process of test development and test construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Qualitative item analysis

A

rely on judegments of reviewers concerning the substantive and stylistic charcateristics of items as well as their accuracy and fairness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Qualitative criteria (how do you evaluate criteria qulitatively?)

A
  • appropriateness of item content and format
  • clarity of expression
  • grammatical correctness
  • adherence to “some basic rules for writing items that have evolved over time”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Quantitative item analysis

A

involves a variety of statistical procedures designed to ascertain the psychometric characteristics of items based in the responses obtained from the samples used in the process of test development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bias

A

any systematic error that enters into scores and affects their meaning in relation to what scores are designed to measure or predict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

The context of item analysis

A
  • usage of simple statistical procedures
  • information on item behavior
  • practical features of interest
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is the decision to create a test developed?

A

when developer realized that either no test exists for a particular purpose or that the existing tests for certain purpose are not adequate for a reason

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Planning a test entails specifying:

A
  • the contruct or knowledge domains that the test will asess
  • the typoe of population with which the test will be used
  • the objectives of the items to be developed
  • the concrete means through which teh behavior samples will begathered and scored
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Steps in the Test development Process

A
  1. Item generation
  2. Qualitative item analysis
  3. Revision/replacement of items
  4. Pilot study
  5. Evaluation of pilot study results
  6. Potentil modification of items
  7. Additional pilot studies
  8. Determination and fixation of test length
  9. Test norming
  10. Test publication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Item generation

A

by writing (or otherwise) creating the test items, the administration and scoring procedures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Qualitative item analysis

A
  • submitting the pool item to experts
  • to identify items that may be a disadvantage, or be offensive to any particular demographic group for which the test is intended
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Revision/Replacement of items

A

items that are identified by the reviwers as inadequate or problematic from the point of view of subject matter, offensiveness, or bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pilot study

A

tryout out items that have been gathered and reviewed on samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Evaluation of pilot study results

A

through quantitative item analysis and additional qualitative analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Potential modification of items

A

adding, deleting or modifying test items as needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Additional pilot studies

A

cross-validation: for checking wthether item statistics remain stable across different groups until a satisfactory set of items is obtained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Determination and fixing of test length

A
  • and the sequencing of items
  • administration and scoring procedures that will standard in the final from of the test - on the basis of the foregoing analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Test norming

A

administering the test to a new sample of individuals in order to develop normative data or performance criteria, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Test publication

A

Publishing the final form, along with administration and scoring manual
(accompanying documentation of the use for which the test is intended)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

computer adaptive testing (CAT)

A

relies on banking pools of items that have been carefully calibrated with respoect to the information they convey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Types of responses they require from test takers (two types)

A
  1. selected-response items

2. constructed-response items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Selected-response items

A
  • objective or fixed-response items
  • present a limited number of response alternatives (from which test takers must choose)
  • Pass/fail items (dichotomous), ranking scale items (polytomous)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Forced choice

A

objective items that require test takers to chosse between two or more alternatives is most or least characteristic of themselves
- mainly used in multidemensional personality inventories

25
Q

Ipsative scores

A

ordinal numbers that simply reflect test taker’s raking of constructs assesses by the scaled within a forced-choice format test

26
Q

Advantages of Selected-Response items

A
  • easy to administer, easy to score
  • high degree of objectivity
  • enhance test score reliability
  • group administration
27
Q

Disadvantage of Selected-Response items

A
  • restrict the potential responses to a selection chosen by test developers
  • susceptible to guessing (pass/fail)
  • response styles (tendiency towards middle/extremes)
  • responses can be disorted by social desirability, self-monitoring, etc.
28
Q

Constructed- response items

A
  • “Free” response format
  • open-ended (variety is limitless)
  • but usually some constarints on response behavior in instructions (time limit, length of response, usage of materials, etc.)
29
Q

In personality testing

A

the use of constructed responses is limited mainly to projective techniques also know as performance-based measures of personality
- respond as freely as possible and reveal aspect of personality

30
Q

Advantages of Constructed-response items

A
  • do not restrict response behavior to pre-selected options
  • may elicit greater acceptance of a test
  • provide richer samples of the behavior of examinees (unique charcateristic can emerge)
31
Q

Disadvantages of Contructed-response items

A
  • less objective scoring (might be evaluated different by different scorers)
  • highly diverse responses, questioning their comparability
  • test lenght: responses require more time for administration and scoring
32
Q

What is the most important aspect in qualitative item analysis?

A
  • item validity: does a specific item carry its own weight within a test by eliciting information that advances the purpose of the test? (also item discrimination)
33
Q

Classical methods of item analysis

A
  1. Item difficulty

2. Item Discrimination (item validity)

34
Q

Qualitative evaluation (inspect content regarding…)

A
  • approprateness
  • difficulty level
  • possible bias or offensiveness toward any group
35
Q

Quantitative evaluation

A

of item diffulty is carried out through statistcis that assess wtather items perform the way they were intended to perform

36
Q

Item difficulty

A
  • diffculty level of a test as a whole is a function of the difficulty levels of the indivudal items that make up a test (easy items = easy test)
  • item diffculty is sample dependent (depends on the ability of test takers)
37
Q

proportion (or percentage) passing (p)

A

the higher the prcentage passing, the easier the item is

- when normally distributed, p values can be trasnformed into z-values

38
Q

Z- values

A

relative difficulty levels of items can be compared across various groups by administering a commong set of item (anchor items)

39
Q

How can item diffulty be gauged?

A
  • difficulty in words in the frequency with which they are used in the language
  • quantitaive indexes: percentage of test takers who aswer an item correctly (p-value)
40
Q

Absolute scaling

A

allows for the difficulty items to be placed on a uniform numerical scale for samples of test takers at different ability levels

41
Q

Item difficulty levels, test difficulty levels, and test purpose

A
  • the average score on a test is the same as the average diffuclty on its items
  • average percenatge passing (p) for the items in a test id 80%, the average score on the test will be 80% as well
42
Q

insufficient ceiling

A

when test items are too easy for a certain group

- the distribution will be negatively skewed

43
Q

inadequate floor

A

when the test items are too difficult for a certain group

- the score distribution is positively skewed

44
Q

Distractors

A
  • have a great deal of influence on item difficulty
  • the number of sitractors directly affects indexes of item difficulty because the probability of guessing correctly is higher when the number pf choice is smaller
45
Q

Item validity - Item discrimination

A

the extent to which items elicit responses that accurately differentiate test takers in terms of behavior, knowledge, or other charcateristics that a test is designed to evaluate

46
Q

Item validation criteria

A
  • Internal criteria - total test score is used to validate item (homogeneity of the test increases)
  • > the reliability indexes based on interitem consistency is enhanced
  • external criteria- test are used in valdating items
  • > the validity of scores on the test as a whole is enhanced
47
Q

unidimensional traits

A

total score may be used to validta items

- all test items should correlate highly with the total score and each other

48
Q

Complex and multifaceted constructs

A
  • items are validated against external criteria that are also mor global
  • not necessarily have to correlate highly with one another (not homogenous?)
49
Q

Item Validity statistics

A

all statistical procedures used to gauge the degree to which items discriminate in term of criterion require info about…

  1. item performance
  2. criterion stading for individuals in the samples from which the item discrimination statistics are extracted
50
Q

Index discrimination (D)

A

the differences in the percenatge or proportion of test takers in the upper and lower criterion groups who pass a given item or answer in the keyed direction
- positive discrimination indexes: more individuals in the upper criterion group

51
Q

Computation of D

A

test takers must be classified into distinct criterion groups based either on their total scores on the test or on some external indicator of their standing on the constructs assessed
- once the groups are created, the percentage (p) of individuals within each group who üasses the item is calculated

52
Q

Correlation coefficients

A
  • test theory method used for expressing the relationship between performance on an item and criterion
53
Q

Point biserial (rpb) correlation coefficient

A

when item scores are dichotomous (pass/fail) and the criterion measure is continuous

54
Q

phi coefficient

A
  • when item scores and the critetion measure are both dichotomous
  • both can range from -1 to +1
55
Q

Test can be classified into three categories

A
  1. Pure speed tests
  2. Pure power tests
  3. Blend of speed and power
56
Q

Pure speed test

A

difficulty is manipulated mainly through timing, limits are so short that most test takers connot complete all the items
- when test takers fisnih all items - actual capacity has not been determined

57
Q

Pure power test

A
  • have no time limits

- difficulty is mainly manipulated by increasing or decreasing the complexity of items

58
Q

Blend of speed and power

A

most ability tests - fall between the extremes of pure-speed and pure-power continuum (time limits allow test takers to attempt all items)

59
Q

Item-test regression

A

necessary to calculate the proportion of individuals at each total score who passed a given item (combine info about item difficulty and item discrimination)