Methods and Statistics used in Research Studies and Test Construction Flashcards

1
Q

an umbrella term for all that goes into the process of creating a test

A

Test Development

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

brainstorming of ideas about what kind of test a developer wants to publish
- stage wherein the ff. is determined: construct, goal, user, taker, administration, format, response, benefits, costs, interpretation
- determines whether the test would be norm-referenced or criterion-referenced

A

I. Test Conceptualization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

preliminary research surrounding the creation of a prototype of the test

A

Pilot Work/Pilot Study/Pilot Research

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

stage in the process that entails writing test items, revisions, formatting, setting scoring rules
- it is not good to create an item that contains numerous ideas

A

II. Test Construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

reservoir or well from which the items will or will not be drawn for the final version of the test

A

Item Pool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

relatively large and easily accessible collection of test questions

A

Item Banks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

refers to an interactive, computer administered test-taking process wherein items presented to the testtaker are based in part on the testtaker’s performance on previous items

A

Computerized Adaptive Testing:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

occurs when there is some lower limit on a survey or questionnaire and a large percentage of respondents score near this lower limit (testtakers have low scores)

A

Floor Effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

occurs when there is some upper limit on a survey or questionnaire and a large percentage of respondents score near this upper limit (testtakers have high scores)

A

Ceiling Effects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

ability of the computer to tailor the content and order of presentation of items on the basis of responses to previous items

A

Item Branching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

form, plan, structure, arrangement, and layout of individual test items

A

Item Format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

offers two alternatives for each item

A

Dichotomous Format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

each item has more than two alternatives

A

Polychotomous Format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

a format where respondents are asked to rate a construct

A

Category Format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

subject receives a longlist of adjectives and indicates whether each one if characteristic of himself or herself

A

Checklist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

items are arranged from weaker to stronger expressions of attitude, belief, or feelings

A

Guttman Scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

require testtakers to select response from a set of alternative responses

A

Selected-Response Format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Has three elements: stem (question), a correct option, and several incorrect alternatives (distractors or foils), Should’ve one correct answer, has grammatically parallel alternatives, similar length, alternatives that fit grammatically with the stem, avoid ridiculous distractors, not excessively long, “all of the above”, “none of the above” (25%)

A

Multiple Choice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

a distractor that was chosen equally by both high and low performing groups that enhances the consistency of test results

A

Effective Distractors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

may hurt the reliability of the test because they are time consuming to read and can limit the no. of good items

A

Ineffective Distractors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

less likely to be chosen, may affect the reliability of the test bec the testtakers may guess from the remaining options

A

Cute Distractors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Test taker is presented with two columns: Premises and Responses

A

Matching Item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Usually takes the form of a sentence that requires the testtaker to indicate whether the statement is or is not a fact (50%)

A

Binary Choice

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

requires testtakers to supply or to create the correct answer, not merely selecting it

A

Constructed-Response Format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Requires the examinee to provide a word or phrase that completes a sentence

A

Completion Item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Should be written clearly enough that the testtaker can respond succinctly, with short answer

A

Short-Answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

allows creative integration and expression of the material

A

Essay

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

process of setting rules for assigning numbers in measurement

A

Scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q
  • involve classification or categorization based on one or more distinguishing characteristics
  • Label and categorize observations but do not make any quantitative distinctions between observations
  • mode
A

Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

rank ordering on some characteristics is also permissible

-median

A

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

contains equal intervals, has no absolute zero point (even negative values have interpretation to it)

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q
  • has true zero point (if the score is zero, it means none/null)
    Easiest to manipulate
A

Interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q
  • produces ordinal data by presenting with pairs of two stimuli which they are asked to compare
  • respondent is presented with two objects at a time and asked to select one object according to some criterion
A

Paired Comparison

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

– respondents are presented with several items simultaneously and asked to rank them in order or priority

A

Rank Order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

– respondents are asked to allocate a constant sum of units, such as points, among set of stimulus objects with respect to some criterion

A

Constant Sum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

– sort object based on similarity with respect to some criterion

A

Q-Sort Technique

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

– rate the objects by placing a mark at the appropriate position on a continuous line that runs from one extreme of the criterion variable to the other
- e.g., Rating Guardians of the Galaxy as the best Marvel Movie of Phase 4

A

Continuous Rating

38
Q

– having numbers or brief descriptions associated with each category
- e.g., 1 if your like the item the most, 2 if so-so, 3 if you hate it

A

Itemized Rating

39
Q

– indicate their own attitudes by checking how strongly they agree or disagree with carefully worded statements that range from very positive to very negative towards attitudinal object
- principle of measuring attitudes by asking people to respond to a series of statements about a topic, in terms of the extent to which they agree with them

A

Likert Scale

40
Q

– a 100-mm line that allows subjects to express the magnitude of an experience or belief

A

Visual Analogue Scale

41
Q

– derive respondent’s attitude towards the given object by asking him to select an appropriate position on a scale between two bipolar opposites

A

Semantic Differential Scale

42
Q

– developed to measure the direction and intensity of an attitude simultaneously

A

Staple Scale

43
Q

– final score is obtained by summing the ratings across all the items

A

Summative Scale

44
Q

involves the collection of a variety of different statements about a phenomenon which are ranked by an expert panel in order to develop the questionnaire
- allows multiple answers

A

Thurstone Scale

45
Q

the respondent must choose between two or more equally socially acceptable options

A

Ipsative Scale

46
Q

the test should be tried out on people who are similar in critical respects to the people for whom the test was designed
- An informal rule of thumb should be no fewer than 5 and preferably as many as 10 for each item (the more, the better)

A

III. Test Tryout

47
Q

Risk of using few subjects = ______

A

phantom factors emerge

48
Q

A good test item is one that answered _________ by high scorers as a whole

A

correctly

49
Q

administering a large pool of test items to a sample of individuals who are known to differ on the construct being measured

A

Empirical Criterion Keying

50
Q

statistical procedure used to analyze items, evaluate test items

A

Item Analysis

51
Q

employed to examine correlation between each item and the total score of the test

A

Discriminability Analysis

52
Q

suggest a sample of behavior of an individual

A

Item

53
Q

a blueprint of the test in terms of number of items per difficulty, topic importance, or taxonomy

A

Table of Specification:

54
Q

Define clearly what to measure, generate item pool, avoid long items, keep the level of reading difficulty appropriate for those who will complete the test, avoid double-barreled items, consider making positive and negative worded items

A

Guidelines for Item writing

55
Q

items that convey more than one ideas at the same time

A

Double-Barreled Items

56
Q

defined by the number of people who get a particular item correct

A

Item Difficulty

57
Q

calculating the proportion of the total number of testtakers who answered the item correctly; The larger, the easier the item

A

Item-Difficulty Index

58
Q

__________ for personality testing, percentage of individual who endorsed an item in a personality test

A

Item-Endorsement Index

59
Q

The optimal average item difficulty is approx. _________ with items on the testing ranging in difficulty from about 30% to 80%

A

50%

60
Q

Omnibus Spiral Format

A

items in an ability are arranged into increasing difficulty

61
Q

provides an indication of the internal consistency of a test

A

Item-Reliability Index

62
Q

The higher ___________, the greater the test’s internal consistency

A

Item-Reliability index

63
Q

designed to provide an indication of the degree to which a test is measure what it purports to measure

A

Item-Validity Index:

64
Q

The higher Item-Validity index, the greater the test’s _________

A

criterion-related validity

65
Q

measure of item discrimination; measure of the difference between the proportion of high scorers answering an item correctly and the proportion of low scorers answering the item correctly

A

Item-Discrimination Index

66
Q

compares people who have done well with those who have done poorly

A

Extreme Group Method:

67
Q

difference between these proportion

A

Discrimination Index:

68
Q

correlation between a dichotomous variable and continuous variable

A

Point-Biserial Method

69
Q

graphic representation of item difficulty and discrimination

A

Item-Characteristic Curve

70
Q

one that eluded any universally accepted solutions

A

Guessing

71
Q

testtaker obtains a measure of the level of the trait; thus, high scorers may suggest high level in the trait being measured

A

Cumulative Model

72
Q

testtaker response earn credit toward placement in a particular class or category with other testtaker whose pattern of responses is similar in some way

A

Class Scoring/Category Scoring –

73
Q

compares testtaker’s score on one scale within a test to another scale within that same test, two unrelated constructs

A

Ipsative Scoring

74
Q

characterize each item according to its strength and weaknesses
- As revision proceeds, the advantage of writing a large item pool becomes more apparent because some items were removed and must be replaced by the items in the item pool

A

IV. Test Revision

75
Q

revalidation of a test on a sample of testtakers other than those on who test performance was originally found to be a valid predictor of some criterion; often results to validity shrinkage

A

Cross-Validation

76
Q

decrease in item validities that inevitably occurs after cross-validation

A

Validity Shrinkage

77
Q

conducted on two or more test using the same sample of testtakers

A

Co-validation:

78
Q

creation of norms or the revision of existing norms

A

Co-norming

79
Q

test protocol scored by highly authoritative scorer that is designed as a model for scoring and a mechanism for resolving scoring discrepancies

A

Anchor Protocol:

80
Q

discrepancy between scoring in an anchor protocol and the scoring of another protocol

A

Scoring Drift

81
Q

item functions differently in one group of testtakers known to have the same level of the underlying trait

A

Differential Item Functioning

82
Q

test developers scrutinize group by group item response curves looking for DIF Items

A

DIF Analysis

83
Q

items that respondents from different groups at the same level of underlying trait have different probabilities of endorsing a function of their group membership

A

DIF Items:

84
Q

refers to an interactive, computer administered test-taking process wherein items presented to the testtaker are based in part on the testtaker’s performance on previous items

A

Computerized Adaptive Testing

85
Q

The test administered may be different for each testtaker, depending on the test performance on the items presented

Reduces floor and ceiling effect

A

Computerized Adaptive Testing

86
Q

occurs when there is some lower limit on a survey or questionnaire and a large percentage of respondents score near this lower limit (testtakers have low scores)

A

Floor Effects:

87
Q

occurs when there is some upper limit on a survey or questionnaire and a large percentage of respondents score near this upper limit (testtakers have high scores)

A

Ceiling Effects:

88
Q

ability of the computer to tailor the content and order of presentation of items on the basis of responses to previous items

A

Item Branching

89
Q

subtest used to direct or route the testtaker to a suitable level of items

A

Routing Test

90
Q

setting cut scores that entails a histographic representation of items and expert judgments regarding item effectiveness

A

Item-Mapping Method:

91
Q

– the level of which a the minimum criterion number of correct responses is obtained

A

Basal Level

92
Q

– standardized test administration is assured for testtakers and variation is kept to a minimum

Test content and length is tailored according to the taker’s ability

A

Computer Assisted Psychological Assessment