Ch. 6 Reliability (Part 2) Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Based on IRT, what is the individual’s expected performance on a particular test question?

A

It is a function of 60th the level of difficulty of the item and the individual’s level of ability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the assumptions of IRT?

A

Items in a test measure a single trait (unidimensional)
Items form a unidimensional scale of measurment
Assumes local independence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an ICC? (Hint: Two or Three parameter model)

A

It describes the relationship between a test taker’s ability and his/her probability to answer an item correctly.
Given one’s level of ability, it determines the probability of correctly answering an item.

  1. discrimination parameter (a)
  2. difficulty parameter (b)
  3. guessing parameter (c)

If there numbers are known, easier to predict.
However, we must estimate these. Causes problems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Discuss the variables in a three parameter model.

A
  1. discrimination parameter (a) = steeper slope, greater discrimination between individuals of different ability. (Low slope = very little change in probability of correct response as a function of differences in ability)
  2. difficulty parameter (b) = level of ability and probability of a correct response. (If difficulty parameter for Item1 is -2, then a person who’s ability is 2 s.d. below the normal mean still has a 60% chance of answering correctly.) 60% is determined by factoring in guessing (c).
  3. guessing parameter (c) - if there are 5 mc questions, then p=0.2 of guessing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Rasch Model?

A

It is the two-parameter model that doesn’t include guessing factor (c).
Basically, individuals of low ability have no way of guessing correctly. (c=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How are ability scores (theta) estimated?

A

Fit the observed item responses to an IRT model.
Hambleton and Swaminathan (1985)
1. Collect a set of observed item responses. (Large sample)
2. Fit the models to the data. (Pick best IRT model).
3. Parameter estimates are assigned to items and ability scores to individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How are error variances treated in IRT?

A

Error variance is treated as homogenous across individuals. These estimates of reliability and generalizability are based on the group (not individual) performance. Measurement error would be the same for all levels of ability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is TIF?

A

Test Information Function is the sum of the item information functions.
It is a measure of how much information a test provides at different ability levels.
TIF is generalizable across different samples of test takers. (Independent of the sample of individuals information is based on)

An individual’s observed score on a given test is interpreted in relation to an estimate of what his average score would be if we were able to obtain a large number of measures of that ability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we determine the dependability of domain score estimates?

A

We need to make sure the sample of items are representative of that domain in question. (Internal consistency, equivalent forms) If sample of items are consistent and have equivalent set of items, then the observed scores should be dependable.

Domain score dependability index =

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Discuss classification errors. (False positive, False negative, Loss ratio)

A

False positive - domain score below the cut off score, but pass
implications - student is put in a higher level class, can’t follow or learn well, loses motivation

False negative - domain score is above cut off score, but classified as a non-master.
Implications - perceived to be more detrimental to a person’s life, social stigma of failing, lose opportunities

Loss-ratio: ratio between seriousness of a false positive error and a false negative error. i.e. False neg is 2x as serious as FP. Loss ratio would be 2. (Loss ratio of 1, equal seriousness).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How do we estimate the dependability of mastery/non-mastery classifications (cut-off scores)?

A

Threshold loss (if all classification errors have equal seriousness)

  1. Give the test twice
  2. Note the individuals who are consistently classified as masters and non-masters on both tests.
  3. N of consistent masters / Total N + N of consistent non-masters / Total N = the number of individuals who are consistently scoring. The remaining is the number of individuals who are going back and forth.

Squared error loss (if classification errors vary in seriousness depending on how far the misclassification is from the cut-off score)
The result is interpreted as the probability of a correct decision, given the amount of measurement error and cut-off

If probability of correct decision is low, then move cut-off further from the mean.

If False positive more serious, then raise cut-off.

Cut off score of 80. CR agreement index = 0.60, NR reliability = 0.60 that mastery decisions would be correct.
To minimize FP, raise cut off score to 90. CR agreement index = 0.80.
To minimize FN, lower cut off score to 70. CR agreement index = 0.80.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the factors that affect reliability estimates?

A
  1. Length of test … longer more reliable, to a point.
  2. Difficulty of test and test score variance … too easy or too difficult would reduce variance and restrict range of scores. NR tests need more variance for reliability and discrimination. CR reliability is unaffected here.
  3. Cut-off score … greater difference between cut-off score and mean score leads to greater CR reliability. More errors if decisions are made near the mean. Lower CR reliability if cut score is close to mean.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly