Test Construction and Consideration of Bias Flashcards

1
Q

What is involved in the standardization and ancillary research step of test construction?

A

norming, detecting and accounting for bias, reliability studies, equating programs (alternate forms, multiple levels [scaling], equating to a previous edition)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the steps in test norming?

A
  1. Define the target population 2. Select the sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two types of overarching sampling methods?

A
  1. Probability sampling (each member of the population has a known, non-0 chance of being included in the sample, equal probability of being chosen)
  2. Non-probability sampling (non random, everyone in the target population does not have an equal chance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the types of probability sampling?

A
  1. Random sampling [everyone has equal/know chance, works well with small homogeneous samples]
  2. Systematic sampling [every nth person gets chosen e.g. every 3rd or 100th etc]
  3. Stratified sampling [split subjects into mutually exclusively groups and randomly sample from those groups, e.g. split class into row and random sample each row]
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the types of non-probability samplings?

A
  1. Convenience sampling [pick whoever is available, opens up to bias]
  2. Judgement sampling [researcher selects sample]
  3. Quota sampling [similar to stratified, people split into different groups and then a convenience sample method is used to get people from the groups]
  4. Snowball sampling [desired characteristics of sample population is rare, ask people who have characteristics to participate and share study info, try to have them recruit friends/acquittances to participate)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What causes bias in samples?

A

a good sample is representative (each sample point represents the attributes of a known number of population elements); bias often occurs when the survey sample does not accurately represent the population. The bias that results from an unrepresentative sample is called selection bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is selection bias? What is an example of it that we discussed in class?

A

under cover occurs when some members of the population are inadequately represented in the sample; a classic example of under coverage is the Literary Digest voter survey that predicted Landon to win over Roosevelt in the 1936 presidential election (under coverage of low-income voters, who tended to be Democrats); this occurred because the survey relied on a convenience sample, drawn from telephone directories and car registration lists; in 1936, people who owned cars and telephones tended to be more affluent); under-coverage is often a problem with convenience samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is non-response bias? What is an example of this?

A

sometimes, individuals chosen for the sample are unwilling or unable to participate in the survey; non-response bias is the bias that results when respondents differ in meaningful ways from non-respondents; the Literary Digest experience illustrates a common problem with mail in surveys (response rate is low, making mail surveys vulnerable to non-response bias)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is voluntary response bias? What is an example of this?

A

voluntary response bias occurs when sample members are self-selected volunteers, as in voluntary samples; an example would be call-in radio shows that solicit audience participation in surveys on controversial topics (abortion, affirmative action, gun control, etc.)—the results tend to over represent individuals who have strong opinions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe what sampling error is

A

the variability among statistics from different samples is called sampling error;
a survey produces a sample statistic, which is used to estimate a population parameter: if you repeated a survey many times, using different samples each time, you would get a different sample statistic with each replication AND each of the different statistics would be an estimate for the same population parameter; if the statistic is unbiased, the average of all the statistics from all possible samples will equal the true population parameter even though an individual statistic may differ from the population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the relationship between sample size and sampling error?

A

increasing the sample size tends to reduce the sampling error. That is, it makes the sample statistic less variable; however, increasing sample size does not affect survey bias (a large sample size cannot correct for methodological problems [e.g., under coverage, non-response bias, etc] that produce survey bias); Literary Digest examples this—the sample size was very large over 2 million were completed it did not overcome problems with the sample (under coverage and non-response bias)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is response bias?

A

response biases are cognitive biases in which the respondent feels compelled to respond in a certain way rather than reflect their true belief; can affect both the reliability and validity of the measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Acquiescence bias?

A

when an individual agrees with statements without regard for the statement’s meaning (yea-saying); the respondent has a tendency to agree with a statement when they are in doubt

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the implications of acquiescence for behavioral science?

A

if some people engage in acquiescent responding while others do not, then test users might not be able to use test scores to identify which people truly have a high level of the construct being assessed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the occurrence of acquiescence bias?

A

some debate about whether it occurs often; perhaps most likely when respondent does not understand test items. One approach to dealing with acquiescence responding on surveys and questionnaires is to employ a balance of positively- and negatively- worded items reflecting the intended content

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are extreme and moderate reporting?

A

both refer to differences in the tendency to use or avoid extremely responses options; Extreme reporting: overuse of extreme options; Moderate reporting: avoidance of extreme options; both are traits that tend to be stable over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the implications of extreme reporting?

A

implications similar to acquiescence bias: creates ambiguity in who truly has high [vs low] levels of construct being measured; occurrence: there is evidence that some people do tend to overuse extreme options while other do not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is social desirability response bias?

A

the tendency for a person to respond in a way that seems socially appealing, regardless of their true characteristics: overreporting of good behavior or underreporting of negative behaviors; when social desirability bias is likely is research, the researcher will often use a scale that measures social desirability responding (if a participant answers in socially desirable manner on that scale, they will likely do so in the actual research study [ex. Marlowe-Crowne Social Desirability Scale—MDSDC] and reserachers will account for that in their analysis)

19
Q

What are the implications of social desirability response bias?

A

can’t tell who truly has a high level of the construct and who is “faking”; can compromise decisions in applied behavioral science; can artificially inflate (or deflate) effects in behavioral research
Social Desirability Debates: is it really a bias? Does it really affect research conclusions as illustrated above?

20
Q

What is Malingering response bias?

A

when an individual “fakes bad”…generally for some secondary gain; most likely to occur when being perceived as having problems would benefit the respondent

21
Q

What is the occurrence of malingering?

A

7.3-27% of general psychological evaluations; as much as 31-45% of forensic evaluations; attorneys may coach clients to “beat” methods of detecting malingering

22
Q

What is random/careless response bias?

A

answering questions in random, semi-random, or simply careless fashion; occurrences: 1-10% of respondents, though varies by context and scope of randomness; most likely when not motivated to be thoughtful/careful (anonymity, feelings of coercion)

23
Q

What is guessing response bias?

A

for tests with “correct” answers; when respondent motivated to select correct answers

24
Q

Auspices Response Bias

A

response dictated by the image or opinion of the sponsor rather than the actual question

25
Q

Mental Set Response Bias

A

cognitions or perceptions based on previous items influence response to later ones (the “priming” problem)

26
Q

Order Response Bias

A

the sequence in which a series is listed affects the responses to other items

27
Q

Prestige Response Bias

A

response indented to enhance the image of the respondent in the eyes of other

28
Q

Hostility Response Bias

A

response arising from feelings of anger or resentment engendered by the response task

29
Q

Threat Response Bias

A

response influenced by anxiety or fear instilled by the nature of the question

30
Q

What are your goals when coping with response bias?

A
  1. Prevent or minimize existence of bias
  2. Minimize the effects of bias
  3. Detect bias and intervene
31
Q

How can you cope with response bias?

A

strategies for achieving goals: manage testing context, manage test content or scoring, use specialized tests; manage testing content in order to prevent/minimize existence of bias (balanced scales; corrections for guessing); manage testing content/scoring in order to detect bias and intervene (embedded validity scales); using specialized tests in order to detect bias and intervene (test designed to detect desirability responding, acquiescence, and so on)

32
Q

What is test bias? Why is it important?

A

It systematically obscures differences between groups (and between people from different groups). Testing is widespread and has important consequences, so tests should differentiate among people based on real psychological differences rather than on group membership; can compromise (decisions about individuals; researchers’ study of group differences)

33
Q

What are the two types of test bias?

A
  1. Construct Bias (group difference in score meaning; scores reflect different constructs in different groups [or same construct with differing precision])
  2. Predictive Bias (group difference in implications of score use; scores are associated with an important criterion to differing degrees in different groups [test use])
34
Q

What is construct test bias?

A

scores might have different meanings in different groups; differences in test scores (between groups) might not reflect true group differences in a psychological construct (group differences in test scores might occur, even if there are no true group differences in the relevant construct)

35
Q

How can you detect construct bias?

A

usually detected by examining responses to items on a test; note—the existence of group differences in test responses, on its own, does not necessary imply bias; bias exists when there are group differences in test responses and when those differences do not reflect true psychological differences

  1. Differential reliability
  2. Differential rank order of items difficulties
  3. Differential item discrimination index
  4. Differential dimensionality
  5. Differential item functioning (*not getting covered)
36
Q

What is differential reliability?

A

group differences in reliability (test scores reflect the relevant construct with better precision in one group than in another; statistical techniques for formally testing group differences in reliability)

37
Q

What is differential rank order of difficulties?

A

the items that are most difficult (as compared to other items) in one group are not the most difficult in another group (may indicate construct bias); people in group 1 are scoring incorrectly more often on certain items, while other groups don’t score incorrectly as much on those specific items.

38
Q

What is differential item discrimination?

A

compute an item discrimination index (for an item) separately for each group; are those indices different across groups? If so, may indicate construct bias in that item; particularly problematic if this occurs for many items

39
Q

What is differential dimensionality?

A

number of dimensions/factors, connections between items and factors, correlations among factors; factor analyze test separately in each group; is factor structure different across groups? If so, may indicate construct bias; best done via confirmatory factor analysis (CFA)

40
Q

What is predictive test bias?

A

if a test is used to inform decisions/predictions about people, then it should do so equally well for all groups of people; if a test is more predictive (of an important outcome) for some group than for others, then it suffers from predictive bias

41
Q

How can you detect predictive bias?

A

to what degree are scores more predictive of a certain in one group than in another?

  1. Administer test to people in each other
  2. Measure criterion for all those people
  3. Conduct statistical analyses to detect whether test scores are associated with criterion differently in the groups (usually via linear or multiple regression)
42
Q

Questions to ask yourself when considering test bias?

A
  1. Is the question applicable to all respondents? If not, reword it and/or use a detour around it for those to whom it does not apply
  2. Is the question as free from threat to respondents as possible? If not, change it to reduce the threat; consider depersonalizing it and/or asking about hypothetical situations
  3. Is the question “loaded” with a reason for responding in a particular way? If so, the reason must be deleted.
43
Q

What are the final materials and publications for test construction?

A

Technical manual; score reports; supplementary materials

44
Q

What is important to consider when constructing a test?

A
  1. The original conceptualization is more important than the technical/statistical work
  2. You need to spend substantial time studying the area before starting to write items
  3. In the original design stage, you need to think about the final score reports
  4. When preparing items, aim for simplicity
  5. Be sure to out enough items: generally, two or three times the number needed for the final test
  6. Do a simple, informal tryout before the major tryout
  7. From a statistical viewpoint, the standardization group need not be very large, if properly selected