Stats - Test Construction Flashcards

1
Q

For a test to be good, what two things does it need?

A

Reliability and validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Does validity or reliability involve consistency, repeatability and dependability in scores?

A

Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s the formula of True Score Model or Classical Test Theory (i.e., total variability = ______ + _______)

A

total variability = true score variability + error variability (a combo of truth & error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If true score variability is 80% then what percentage is error variability?

A

20%
total variability is always 100%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

If true score variability is 80% then what is the reliability score?

A

0.80 (always expressed as a coefficient)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What’s the minimal acceptable reliability

A

0.80

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are 3 common sources of error in tests, and what are they?

A
  1. content sampling: items that do or don’t tap into a test-takers knowledge
  2. time sampling: scores are different at two points in time due to factors related to the passage of time
  3. test heterogeneity: when a test has heterogeneous items tapping more than one domain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How’s reliability increased in relation to number of items?

A

Reliability increases when the number of items increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How’s reliability increased in relation to homogeneity of items?

A

the more homogeneous the items are, the greater the reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How’s reliability increased in relation to range of scores?

A

An unrestricted range of scores increases the reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Does the ability to guess increase or decrease reliability?

A

It decreases reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is test-retest reliability?

A

When you administer the identical test to the same sample of people at two points in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Coefficient of stability is another name for what?

A

Test-retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What’s parallel forms reliability?

A

It’s calculated by the scores obtained by the same groups of people on two roughly equivalent but not identical forms of the same test administered at two different points in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Coefficient of equivalence is another name for what?

A

Parallel forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What type of reliability is when you look at the consistency of the scores within the test; therefore, the test is administered only once to one group of people

A

Internal consistency reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the two subtypes of internal consistency?

A
  1. Split-half reliability
  2. Kuder Richardson (KR-20/21) or Coefficient Alpha
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What 1 of 4 reliability estimates is associated with the Spearman brown prophecy formula?

A

Internal consistency reliability in the subtype of split-half reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is subtype called from internal consistency reliability where it finds the average of every possible way of splitting the test in half

A

Kuder Richardson or Coefficient Alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What reliability estimate looks at the degree of agreement between two or more scorers when a test is subjectively scored?

A

Interrater (interscorer) reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What reliability estimate connects to percent agreement, Pearson r, Kappa stats or Yule’s Y

A

Interrater reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

If the test was perfectly reliable what would be the standard error of measurement score?

A

0.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the highest score of the standard error of measurement?

A

It would be equal to the standard deviation of the test

24
Q

What are the 3 confidence intervals?

A

-1 & +1 = 68%
-2 & +2 = 95%
-3 & +3 = 99%

25
Q

What connects to the meaningfulness, usefulness or accuracy of a measure in terms of predicting? Is it a good predictor?

A

Validity

26
Q

Are the items on my test really measuring the right knowledge or skills to predict if someone’s good for a job?
Is this content, criterion or construct validity?

A

Content validity

27
Q

Whether the test is valid or accurate in terms of predicting? Can we use our test to predict something? (e.g., sales potential or salesmen on my team)
Is this content, criterion, or construct validity?

A

Criterion validity

28
Q

Whether my instruments are measuring the trait that I think it’s measuring (make a scale that measures assertiveness, but is it actually measuring assertiveness or is it measuring aggression?
Is this content, criterion, or construct validity?

A

Construct validity

29
Q

What type of validity, content vs construct, measures skills/knowledge or measures a trait

A

Content= skills and knowledge
Construct = always a trait

30
Q

What are the two subtypes of criterion-related validity?

A
  1. concurrent (measuring the predictor & outcome at the same time)
  2. predictive (delay between the measurement of predictor and the criterion)
31
Q

The criterion-related validity coefficient is Rxy; what does x represent, and what does y represent?

A

x = test/predictor scores
y= outcome/criterion scores

32
Q

How do you find out how much the outcome (criterion variability) can be explained by the predictor if you have the criterion-related validity coefficient score (e.g., 0.50)

A

You square it (0.50 squared = 0.25 = 25%)

33
Q

What’s the maximum value of the standard error of the estimate?

A

It’s the same as the standard deviation of the criterion

34
Q

3 applications of the criterion related validity coefficient

A
  1. Expectancy tables
  2. Taylor Russel tables
  3. Decision making theory
35
Q

Which of the 3 applications lists the probability that a person’s criterion score will fall in a specific range based on the range in which that person’s predictor score fell?
1. Expectancy Tables
2. Taylor-Russel
Tables
3. Decision making theory

A
  1. Expectancy Tables
36
Q

Which of the 3 applications numerically shows how much more accurate selection decisions are when a particular predictor test as opposed to not using one?
1. Expectancy Tables
2. Taylor-Russel
Tables
3. Decision making theory

A
  1. Taylor Russel Tables
37
Q

What’s the definition of a base rate? (Taylor Russel Table)

A

the rate of selecting successful employees without using a predictor test

38
Q

What’s the definition of a selection ratio? (Taylor Russel Table)

A

the proportion of available openings to available applicants (e.g., 3 openings with 10 applicants)

39
Q

What’s the definition of incremental validity (Taylor Russel Table)

A

it’s the amount of improvement in the success rate that results from using a predictor test
e.g.,
base rate = 0.40 (40%) to 0.65 (65%)
incremental validity = 65% - 45% = 25%
incremental validity = 25%

40
Q

When is incremental validity optimized? (in connection to base rate & selection ratio)

A
  1. Moderate base rate (around 0.5)
  2. Low selection ratio (around 0.1)
41
Q

Which of the 3 applications takes performance predictions based on predictor tests and compares them with the actual criterion outcome?
1. Expectancy Tables
2. Taylor-Russel
Tables
3. Decision-making theory

A
  1. Decision-making theory
42
Q

Draw out a decision-making theory table: where does criterion/outcome go, where does predictor/test go, & where do false negative, true negative, false positive & true positive go?

A

Criterion/outcome = Y (up & down axis)
Predictor/test = X (right to left axis)
false negative = top left
true negative = bottom left
false positive = bottom right
true positive = top right

43
Q

If a company wanted to reduce false positives of a pregnancy test, what do you do to the predictor & criterion cut offs?
1. Increase both
2. Decrease both
3. Increase predictor & decrease criterion
4. Decrease predictor & increase criterion

A
  1. Increase predictor & decrease criterion
    i.e., make the false positive box smaller in the decision-making theory table
44
Q

What does cross-validation do to the criterion-related validity coefficient?

A

Results in shrinkage of the criterion-related validity coefficient

45
Q

What does correction for attenuation calculate concerning validity and the reliability of predictor (x) & criterion (Y)?

A

How much higher validity would be if the predictor (X) and criterion (Y) were both perfectly reliable

46
Q

What’s it called when the criterion is subjectively scored, and the rater knows how people did on the predictor before assigning them to the criterion: (e.g., if teacher knows student’s IQ scores before an assignment and it and then they give students with higher IQs higher grades on the assignment)

A

Criterion Contamination

47
Q

What does criterion contamination result in? (in connection to criterion-related validity coefficient scores)

A

inflated/spuriously high criterion-related validity coefficient

48
Q

What type of validity looks at how adequately a new test measures a trait?

A

Construct validity

49
Q

Two subtypes of construct validity

A
  1. Convergent validity - come together
  2. Divergent (discriminant) validity - go apart
50
Q

With convergent vs divergent validity, which measures the same trait and which measures different traits?

A

Convergent = comparing measures of the same trait
Divergent = comparing measures of different traits

51
Q

With convergent vs divergent validity, which one wants high correlations, and which one wants low correlations?

A

Convergent = high correlations
Divergent = low correlations

52
Q

With convergent vs divergent validity, which relates to high correlations of monotrait-heteromethod (same traits are measured by different methods) and which relates to low correlations of heterotrait-monomethod (dif traits are measured by same method)

A
  1. Convergent = high correlations of monotrait-heteromethod
  2. Divergent = heterotrait-monomethod
53
Q

With standard error of measurement, where do you plot the person’s score in relation to a bell shaped distribution and how do you figure out what the other numbers are in relation to +1, +2, +3, -1, -2, -3

A

The person’s score goes in the middle (0) of the distribution & then add or subtract the standard error of measurement to the person’s score (e.g. person’s score = 70 & standard error of measurement = 5, then -1 would be 65 and +1 would be 75)

54
Q

Cronbach’s Alpha is a measure of…

A

Internal Consistency Reliability

55
Q

Reliability is always highest when test items are (heterogenous/homogenous), subjects are (heterogenous/homogenous) and the number of items are (high or low)

A

test items = homogenous
subjects = heterogenous
# of items = high