Test Construction Flashcards

1
Q

reliability

A

amount of consistency, repeatability, and dependability in scores obtained on a given test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

classical test theory

A

any obtained score is a combination of truth and error total variability = true score variability + error variability reliability is the proportion of true score variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

reliability coefficient

A

rxx or rtt commonly derived by correlating score obtained on test at one point in time (x or t) with score obtained at second point in time (x or t)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

common sources of error in tests (3)

A

content sampling, time sampling, test heterogeneity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

content sampling error

A

when a test, by chance, has items that tape into test-taker’s knowledge base or items that don’t tap into a test-taker’s knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

time sampling error

A

occurs when a test is given at two different points in time and the scores on each administration are different because of factors related to the passage of time (e.g. forgetting over time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

test heterogeneity error

A

error due to test heterogeneity occurs when a test has heterogeneous items tapping more than one domain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

factors affecting reliability

A

number of items (reliability INCREASES when number of items increased) homogeneity of items - refers to items tapping into similar content items (reliability INCREASES with increased homogeneity) range of scores - unrestricted range maximizes reliability, related to heterogeneity of subjects (range of scores INCREASES with increased subject heterogeneity) ability to guess - true/false tests easier to guess (reliability DECREASES as ability to guess increases)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

four estimates of reliability

A

test-retest reliability parallel forms reliability internal consistency reliability - split-half reliability, Kuder-Richardson (KR-20 & KR-21), Cronbach’s Alpha interrater reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

test-retest reliability

A

expressed as coefficient of stability involves correlating pairs of scores from the same sample of people who are administered the identical test at two points in time major source of error = time sampling (correlated decreases when time interval between administrations increases)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

parallel forms reliability

A

expressed coefficient of equivalent correlating the scores obtained by the same group of people on two roughly equivalent but not identical forms of the same test administered at two different points in time major source of error = time sampling and content sampling (subjects may be more or less familiar with items on one version of the test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

internal consistency reliability

A

looks at consistency of scores within the test test administered only once to one group of people split-half reliability Kuder-Richardson (KR-10 & KR-21) or Cronbach’s coefficient alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

split-half reliability

A

calculated by splitting the test in half and then correlating scores obtained on each half by each person Spearman-Brown formula typically used major source of error = item or content sampling (someone might, by chance, know more items on one half)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Kuder-Richardson (KR-20 & KR-21) & Cronbach’s Coefficient Alpha

A

Sophisticated forms of internal consistency reliability involve analysis of correlation of each item with every other item on the test reliability calculated by taking mean of correlation coefficients for every possible split-half KR-20 & KR-21: when items are scored dichotomously (correct or incorrect) Cronbach’s Coefficient Alpha: when items are scored non-dichotomously and there is a range of possible scores for each item or category (e.g. Likert Scale) Major sources of error: content sampling and test heterogeneity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

interrater reliability

A

looks at degree of agreement between two or more scorers when test subjectively scored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

standard error of measurement

A

theoretical distribution: one person’s scores if he/she were tested hundreds of times with alternate or equivalent forms of the test standard deviation of a theoretically normal distribution of test scores obtained by one individual on equivalent tests ranges from 0.0 to SD of test when test perfectly reliable, standard error of measurement would be 0.0

95% probability that a person’s true score lies within two standard errors of measurement of the obtained score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

content validity

A

addresses how adequately a test samples a particular content area quantified by asking panel of experts if each item is essential, useful/not essential, or not necessary no numerical validity coefficient is derived

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

criterion-related validity

A

looks at how adequately a test score can be used to infer, predict, or estimate criterion outcome e.g. how well SAT scores predict college GPA coefficient (rxy) ranges from -1.0 to 1.0 validities as low as 0.20 considered acceptable two subtypes: concurrent validity and predictive validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

concurrent validity

A

predictor and criterion are measured and correlated at about the same time

20
Q

predictive validity

A

delay between the measurement of the predictor and criterion

21
Q

standard error of estimate

A

theoretical distribution: one person’s criterion scores if he/she were measured hundreds of times on the criterion; spread of this distribution is the average amount of error in estimating standard deviation of a theoretically normal distribution of criterion scores obtained by one person measured repeatedly minimum value of 0.0 to maximum value of SD of the criterion (SDy) when test is perfect predictor, standard error of estimate is 0.0

22
Q

expectancy tables

A

list the probability that a person’s criterion score will fall in a specified range based on the range in which that person’s predictor score fell probabilities expressed in terms of percentages or proportions

23
Q

Taylor-Russell tables

A

show how much more accurate selection decisions are when using a particular predictor test as opposed to using no predictor test base rate, selection ratio, incremental validity

incremental validity optimized when base rate is moderate (about .5) and selection ratio is low (close to .1)

24
Q

base rate

A

rate of selecting successful employees without using a predictor test

25
Q

selection ratio

A

proportion of available opens to available applicants

26
Q

incremental validity

A

amount of improvement in success rate that results from using a predictor test optimized when base rate is moderate and selection ratio is low

27
Q

decision making theory

A

takes the predictors of performance that were based on the predictor tests and compares them with the actual criterion outcome

                                        Predictor Cut Off

                  Negatives                 I         Positives

Criterion False Negatives I True Positives Criterion Cut Off

                  True Negatives        I        False Positives

                                            Predictor
28
Q

item response theory

A

used to calculate to what extent a specific item on a test correlates with an underlying construct

29
Q

factors affecting criterion-related validity

A

range of scores (validity maximized by unrestricted range of scores on both the predictor and criterion) reliability of the predictor (validity predictor scores before assigning them to criterion ratings

30
Q

correction for attenuation

A

calculates how much higher validity would be if the predictor and criterion were both perfectly reliable

31
Q

construct validity

A

looks at how adequately a new test measures a construct or trait construct is a hypothetical concept that typically cannot be measured (e.g. motivation, fear, aggression) evidence of construct validity most commonly ascertained using factor analysis, or multi-trait, multi-method matrix

32
Q

multi-trait, multi-method matrix

A

table with information about convergent and divergent validity (both necessary for construct validity)

33
Q

convergent validity

A

correlation of scores on the new test with other available measures of same trait

34
Q

divergent (discriminant) validity

A

correlation of scores on the new test with scores on another test that measures a different trait or construct

35
Q

Degrees of Freedom for T-Test

Single Sample

Matched/Correlated Sample

Independent Samples

A

Single Sample: N-1

Matched/Correlated Sample: #pairs-1

INdependent Samples: N-2

36
Q

Degrees of Freedom for Chi Square

Single Sample

Matched Sample

A

Single Sample: #rows -1

Multiple Sample: (#rows-1)(#columns-1)

37
Q

Degrees of Freedom ONe-Way Anova

df total

df between

df within

A

df total: N-1

df between: # groups - 1

df within: df total - df between

38
Q

How to calculate shared/explained variability

A

square the correlation

39
Q

How to calculate correlation

A

square root shared/explained variability

40
Q

True score variability

A

Reliability coefficient interpreted directly (e.g. if reliability is .64, true score variability is 64%

41
Q

Pearson r range

A

-1.0 - +1.0

42
Q

Range of reliability coefficient

A

0.0 to +1.0

43
Q

Range of validity coefficient

A

-1.0 to +1.0

44
Q

Range of standard error of measurement

A

0.0 to SDx

45
Q

range of the standard error of the estimate

A

0.0 to SDj

46
Q

how to calculate 95% confidence interval

A

multiple the standard error of measurement by 1.96 (or 2) and add and substract the result from the examinee’s score