Test Construction Flashcards

1
Q

true score variability

A

variability due to real differences in ability or knowledge in the test-takers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

error variability

A

variability caused by chance or random factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

classical test theory

A

test scores = true score variability + error variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

reliability

A

the amount of consistency, repeatability, and dependability in scores obtained on a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

reliability coefficient

A
  • represented as ‘r’
  • ranges from 0.00-1.00
  • minimum acceptability is 0.8
  • two factors that affect the size are: range of test scores and the homogeneity of test content
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

sources of errors in tests

A
  • content sampling
  • time sampling (ex - forgetting over time)
  • test heterogeneity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

factors that effect reliability

A
  • number of items (the more the better)
  • homogeneity (the more similar the items are, the better)
  • range of scores (the greater the range, the better)
  • ability to guess (true/false = the least reliable)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

test-retest reliability
(or coefficient of stability)

A
  • correlating pairs of scores from the same sample of people who are administered the identical test at two points in time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

parallel forms reliability
(or coefficient of equivalence)

A
  • correlating the scores obtained by the same group of people on two roughly equivalent but not identical forms of the same test administered at two different points in time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

internal consistency reliability

A
  • looks at the consistency of the scores within the test
  • 2 ways: Kuder-Richardson or Cronbach’s coefficient alpha
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

split half reliability

A
  • splitting the test in half (ex - odd vs even numbered questions) and then correlating the scores based on half the number of items
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Spearman-Brown prophecy formula

A
  • a type of split-half reliability
  • tells us how much more reliable the test would be if it were longer

*inappropriate for speeded tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Kuder-Richardson and Cronbach’s coeffcient alpha

A
  • involve analysis of the correlation of each item with every other item in the test (reliability/internal consistency)
  • KR-20 is used when items are scored dichotomously (ex - right or wrong)
  • Cronbach’s is used when items are scored non-dichotomously (ex - Likert scale)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

interrater reliability

A
  • the degree of agreement between two or more scorers when a test is subjectively scored
  • best way to improve = provide opportunity for group discussion, practice exercises, and feedback
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

validity

A
  • the meaningfulness, usefulness, or accuracy of a measure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

3 basic types of validity

A
  • content
  • criterion
  • construct
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

face validity

A
  • the degree to which a test subjectively appears to measure what it says it measures
18
Q

content validity

A
  • how adequately a test samples a particular content area
19
Q

true positive

A
  • test takers who are accurately identified as possessing what is being measured

*correct prediction

20
Q

false positive

A
  • test takers who are inaccurately identified as possessing what is being measured

*incorrect prediction

21
Q

true negative

A
  • test takers who are accurately identified as not possessing what is being measured

*correct prediction

22
Q

false negatives

A
  • test takers who are inaccurately identified as not possessing what is being measured

*incorrect prediction

23
Q

item difficulty

A
  • represented by ‘p’
  • can range in value from 0 - 1+ (0 easy, 1 very difficult)
  • difficulty level = number of people who got it right
  • items should have an average difficulty level of 0.50, and a range of 0.30 and 0.80

example - if ‘p’ is 0.10, that means only 10% of people got the item right (therefore it was difficult)

24
Q

diagnostic validity

A
  • focuses on who DOES NOT have the disorder
  • an ideal situation would result in high sensitivity, specificity, hit rate, and predictive values with few false positives and few false negatives
25
Q

convergent validity

A

the degree to which scores have high correlations with scores on another measure that assesses the same thing

26
Q

standard error of measurement

A

used to construct a confidence interval

27
Q

confidence intervals

A

68% = 1 standard error of measurement
95% = 2 standard error of measurement
99% = 3 standard error of measurement

28
Q

content validity

A

Does the content accurately measure all of the topics/items it’s intending to measure?

Example: In a stats exam, content validity would be very low if it only asked you how to calculate the mean. Content validity would increase if it asked additional questions about other things within the stats world as well.

29
Q

criterion validity

A

Does the content accurately reflect a set of abilities in a current or future setting?

2 kinds:
Concurrent criterion = does this test accurately assess my student’s current level of ability?
Predictive criterion = does this test accurately assess how my students will do in the future?

30
Q

construct validity

A

does your test measure the construct it claims to measure (ex - aggression)

construct = a group of interrelated variables that you care about

31
Q

Increase item difficulty

A

0= very difficult
1= very easy
So to increase you want to add items that have p values as close to 0 as possible

32
Q

Sensitivity vs specificity

A

Sensitivity: senses people WITH diagnosis = positive predictive value true positives/ (TP+FN)

Specificity: specifies people who DONT have diagnosis. Negative predictive value = True negatives/ (TN+FN)

33
Q

Factor matrix & commonality

A

Tool that helps us see how different things (variables) are connected or share common traits.

Commonality: outlines proportion of variability

We rotate a matrix to obtain a factor matrix that is easier to interpret

34
Q

Size of Standard error of mean increases as

A

Population SD INCREASES and sample size DECREASES

35
Q

Raising test’s cut off score. Will have which effects?

A

Raising cut off score: will result in fewer applications being hired.

DECREASE: number of false positives
INCREASE: number of true negatives

36
Q

When test scores represent an interval or ratio scale and the distribution is skewed. The best measure of central tendency is what?

A

Median

37
Q

Item characteristic curve intercepts Y (vertical axis) and provides information about which of the following?

A

Probability of answering item correctly by guessing

38
Q

Banding (statistical banding)

A

Put test scores into groups based on a range of possible errors. People with scores in the same group are considered equally good at something (assuming small core differences don’t matter for job performance)

39
Q

Attenuation formula

A

Used to estimate what the maximum criterion related validity coefficient would be if the predictor/criterion had a reliability coefficient of 1

40
Q

When the prevalence of disorder increases how are its magnitude effects of a test’s positive and negative predictive values impacted?

A

When prevalence INCREASES the POSITIVE predictive value INCREASES and NEGATIVE predictive value DECREASES

41
Q

How to calculate standard error of estimate

A

A 68% confidence interval is constructed by adding and subtracting one standard error of estimate to and from the person’s predicted criterion score, a 95% confidence interval is constructed by adding and subtracting two standard errors, and a 99% confidence interval is constructed by adding and subtracting three standard errors.