Psychological Measurement Exam 4 Flashcards

1
Q

a statistic indicating how many test takers responded correctly to an item.

A

item-difficulty index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If 80% got item correct then item-difficulty index is ____.

A

.8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The larger item-difficulty index, the _____the item.

A

easier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You want an item difficulty to be halfway between that and 1, ex. for a 5 option multiple choice item, the probability of guessing correctly is .20, so the optimal item difficulty is therefore ____.

A

.60
.20+1.00=1.20
1.20/2=.60

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

provides an indication of the internal consistency of a test

A

item-reliability index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

the higher the _____, the greater the tests internal consistency.

A

item-reliability index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 3 test construction approaches?

A
  1. Rational approach
  2. Empirical approach
  3. Rational with empirical refinement approach
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Making up statements about traits of a personality to tap every aspect

Ex. I am depressed once a month
I am depressed a couple times a month
I am depressed once a week
I am depressed a couple times a week
I am depressed everyday

This is easy to construct, easy to fake.

A

Rational approach to test construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Uses 2 criterion groups, 1 normal and 1 exhibits trait that you want to tap into. Come up with an item pool of questions, give to both groups. Determine which questions are answered statistically significantly different.

Hard to fake because the questions are random and they don’t know what they’re being tested for.

Limitation - p

A

Empirical approach within test construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Use rationality to come up with questions then run tests with 2 criterion groups and distinguish.

A

Rational with empirical refinement approach within test construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 5 steps in test development?

A
  1. Test conceptualization
  2. Test construction
  3. Test tryout
  4. Item analysis
  5. Test revision
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Coming up with an idea that a test ought to be designed to measure [fill in the blank ] in [such and such ] way.

A

Test conceptualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Preliminary research surrounding the creation of a prototype of the test.

A

Pilot work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Process of setting rules for assigning numbers in measurement.

A

Scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Grouping of words, statements, or symbols on which judgements of the strength of a particular trait, attitude, or emotion are indicated by the test taker.

A

Rating scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Summative rating scale.
Presents test taker with five alternative responses.
Ex. Never, rarely, sometimes, usually, always
(each assigned a value [agreement])

A

Likert scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Test takers are presented with pairs of stimuli (2 photos, 2 objects, 2 statements) which they are asked to compare. Must select one stimuli according to some rule (that they agree more with, etc.)

A

Method of paired comparisons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Printed cards, drawings, photos, objects, etc. are presented for evaluation.

A

Sorting tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Compare stimulus with others (ex. Rank)

A

Comparative scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Stimuli are placed in 1 of 2 or more categories.

A

Categorical scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Yields ordinal level measures.
Ex.
1. All people should have the right to decide whether they wish to end their lives.
2. People who are terminally ill and in pain should have the option to have a doctor assist them in ending their lives.
3. People should have the option to sign away the use of artificial life support equipment before they become seriously ill.
4. People have the right to a comfortable life.
All who agree with 1, also agree with 2, 3, 4…etc.

A

Guttman scale

22
Q

Reservoir or well from which items will or will not be drawn from for the final version of the test.

A

Item pool

23
Q

Form, plan, structure, arrangement, and layout of individual test items.

A

Item format

24
Q

Requires test takers to select a response from a set of alternative responses.

A

Selected-response format

25
Q

Requires test takers to supply or to create the correct answer, not select it.

A

Constructed response format

26
Q

test taker is presented with two columns: premises on the left and responses on the right, determine which response is best associated with which premise.

A

Matching Item

27
Q

Multiple choice that contains only 2 possible responses. Ex: True/False

A

Binary Choice Item

28
Q

Requires the examinee to provide a word or phrase that completes a sentence.

A

Completion Item

29
Q

word, term, sentence, or paragraph

A

Short Answer Item

30
Q

test item that requires the test taker to respond to a question by writing a composition, typically one that demonstrates recall of facts, understanding, analysis, and/or interpretation.

A

Essay Item

31
Q

large and easily accessible collection of test questions often used by teachers.

A

Item Bank

32
Q

interactive, computer-administered test taking program process wherein items presented to the test taker are based in part on the test takers performance on previous items.

A

Computerized Adaptive Testing (CAT)

33
Q

diminished utility of an assessment tool for distinguishing test takers at the low end of the ability, trait, or other attribute being measured.

A

Floor Effect

34
Q

diminished utility of an assessment tool for distinguished test takers at the high end of the ability, trait, or other attribute being measured.

A

Ceiling Effect

35
Q

ability of the computer to tailor the content and order of presentation of test items on the basis of responses to previous items.

A

Item Branching

36
Q

Most commonly cumulative.
-Cumulative-Higher score=higher test taker is on the ability, trait, or other characteristic that the test purports to measure.

A

Scoring Items

37
Q

test taker responses earn credit toward placement in a particular class or category with other test takers whose pattern of responses is presumably similar in some way.

A

Class/Category Scoring

38
Q

comparing a test takers score on one scale within a test to another scale within that same test.

A

Ipsative Scoring

39
Q

tried on people who are similar in critical respects to the people for whom the test was designed.

A

Test Tryout

40
Q

different types of statistical scrutiny that the test data can potentially undergo at this point.

A

Item Analysis

41
Q

statistic indicating how many test takers responded correctly to an item.

Example:
If 80% got item correct then item-difficulty index is .8
Larger item-difficulty index=easier the item.
For a true/false item, the optimal item difficulty is .75
.50+1.00=1.5
1.5/2=.75

For 5 option multiple choice item, optimal is .6
.20+1.00=1.2
1.2/2=.6

A

Item-Difficulty Index

42
Q

provides an indication of the internal consistency of a test; the higher the index-the greater the tests internal consistency.

Item-Reliability= Sx (rxt) Sx being the item score standard deviation
rxt being the correlation between item score and the total test score

A

Item-Reliability Index

43
Q

statistic designed to provide an indication of the degree to which a test is measuring what it purports to measure.
Higher validity index=greater the test’s criterion-related validity.
Item-Validity= Sx (rxc) Sx being the item score standard deviation
rxc being the correlation between item score and the criterion score

A

Item-Validity Index

44
Q

a measure of discrimination, symbolized by a lowercase italic (d). Compares performance on a particular item with performance in the upper and lower regions of a distribution of continuous test scores.
Measure of the difference between the proportion of high scorers answering an item correctly and the proportion of low scorers answering the item correctly.
Ranges from -1 to +1
Higher d, the more it discriminates.

A

Item-Discrimination Index

45
Q

Optimal boundaries of lower and upper _________ are top 27% and bottom 27% of the distribution of scores.-If it’s normal.

A

Item-Discrimination Index

46
Q

Formula:

Passing scores in top - passing scores in bottom
N

A

Item-Discrimination Index

47
Q

a graphic representation of item difficulty and discrimination. Extent to which an item discriminates high from low scoring examinees is apparent from the slope of the curve.
The steeper the slope=the greater the item discrimination.

A

Item Characteristic Curves

48
Q

throw out items that are too easy or too hard and don’t discriminate. Keep those that do discriminate low from high scores and demonstrate reliability. If more questions are needed, go through items that were tossed out and pick out those that are repairable, revise and rewrite them. Redo process, revise standard conditions as well

A

Test Revision

49
Q

the revalidation of a test on a sample of test takers other than those on whom test performance was originally found to be a valid predictor of some criterion.

A

Cross validation

50
Q

the decrease in item validity that inevitably occurs after cross validation of findings (most likely due to chance).

A

Validity Shrinkage

51
Q

as soon as test is taken out of original general context, validity goes down.

A

Generaliability