Ch. 6 Reliability (Part 1) Flashcards

1
Q

What is the difference between G-Theory and Classical Test Theory (CTS)?

A

In G theory, sources of variation are referred to as facets. Facets are similar to the “factors” used in analysis of variance, and may include persons, raters, items/forms, time, and settings among other possibilities. These facets are potential sources of error. The purpose of G theory is to quantify the amount of error caused by each facet and interaction of facets. The usefulness of data gained from a G study is crucially dependent on the design of the study. Therefore, the researcher must carefully consider the ways in which he/she hopes to generalize any specific results. Is it important to generalize from one setting to a larger number of settings? From one rater to a larger number of raters? From one set of items to a larger set of items? The answers to these questions will vary from one researcher to the next, and will drive the design of a G study in different ways.

In addition to deciding which facets the researcher generally wishes to examine, it is necessary to determine which facet will serve as the object of measurement (e.g. the systematic source of variance) for the purpose of analysis. The remaining facets of interest are then considered to be sources of measurement error. In most cases, the object of measurement will be the person to whom a number/score is assigned. In other cases it may be a group or performers such as a team or classroom. Ideally, nearly all of the measured variance will be attributed to the object of measurement (e.g. individual differences), with only a negligible amount of variance attributed to the remaining facets (e.g., rater, time, setting).

By employing simulated D studies, it is therefore possible to examine how the generalizability coefficients (similar to reliability coefficients in Classical test theory) would change under different circumstances, and consequently determine the ideal conditions under which our measurements would be the most reliable.

The focus of classical test theory (CTT) is on determining error of the measurement. Perhaps the most famous model of CTT is the equation X = T + e, where X is the observed score, T is the true score, and e is the error involved in measurement. Although e could represent many different types of error, such as rater or instrument error, CTT only allows us to estimate one type of error at a time. Essentially it throws all sources of error into one error term. This may be suitable in the context of highly controlled laboratory conditions, but variance is a part of everyday life. In field research, for example, it is unrealistic to expect that the conditions of measurement will remain constant. Generalizability theory acknowledges and allows for variability in assessment conditions that may affect measurements. The advantage of G theory lies in the fact that researchers can estimate what proportion of the total variance in the results is due to the individual factors that often vary in assessment, such as setting, time, items, and raters.

Another important difference between CTT and G theory is that the latter approach takes into account how the consistency of outcomes may change if a measure is used to make absolute versus relative decisions. An example of an absolute, or criterion-referenced, decision would be when an individual’s test score is compared to a cut-off score to determine eligibility or diagnosis (i.e. a child’s score on an achievement test is used to determine eligibility for a gifted program). In contrast, an example of a relative, or norm-referenced, decision would be when the individual’s test score is used to either (a) determine relative standing as compared to his/her peers (i.e. a child’s score on a reading subtest is used to determine which reading group he/she is placed in), or (b) make intra-individual comparisons (i.e. comparing previous versus current performance within the same individual). The type of decision that the researcher is interested in will determine which formula should be used to calculate the generalizability coefficient (similar to a reliability coefficient in CTT).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the universe score (Xp) in G-Theory?

A

It is the average of measures for an individual under all different conditions (MCQ, Oral interview, etc.) that would be the best indicator of person’s ability.
[Similar to CTS’s True Score]

Universe score variance (S^2p)
variance of a group of person’s scores on all measures
(proportion of observed score variance that remains constant across different individuals and different measurement facets/conditions)

Unlike CTS True score, a Universe score for an individual varies for different universes of measures. (test occasion, form, etc.)

Generalizability coefficient = proportion of observed score variance that is universe score variance. (var x/ var p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does G-Theory separate error variance due to different test facets?

A

Error variance due to testing technique = var test / var obs score = (s^2, test) / (s^2, x)

Error variance due to passage = (s^2, passage) / (s^2, x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do we determine the relative effect of different sources of variance on observed score?

A

Must obtain multiple measures for each person under different conditions for each facet.
obs score = person, forms, raters
(s^2, x) = (s^2, p) + (s^2, f) + (s^2, r)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does ANOVA capture the interaction between variances of a measure?

A

Person and facets: Person by form, person by rater
Two facets: (s^2, fr)
Between person, form, rater: (s^2, pfr)
residual variance: (s^2, e) not accounted by sources in design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

If there was a high variance for person x form, what does it infer?

A

It is biased for/against someone with either high or low ability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If we found a significant variance for person x rater, what would this indicate?

A

There is a bias in raters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Significant variance for forms x rater?

A

Raters rated some questions more highly than others. Bias?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Importance of IRT over CTS?

A

CTS doesn’t predict how one will perform on a given item. It makes no assumptions about how an individual’s level of ability affects the way he performs on the test.

IRT gives a difficulty estimate as well as the likelihood a person of a certain ability (on bell curve) would answer this correctly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the factors that affect language test scores?

A
  1. Test method facets
  2. Personal attributes
  3. Random factors

These lead to a test score. From this test score we assume a certain level of communicative/language ability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is reliability estimated in CTS?

A

They use correlations from parallel tests.
Infer the correlations between true score and observed score from two tests. Differences are due to error variance.
1. Internal consistency (within test) - split-half, spearman-brown, guttman doesn’t correlate so doesn’t need to assume equivalence of halves. item variance estimate, KR-20 (not equivalent halves would underestimate, if not independent would overestimate), Cronbach alpha
(this is determined FIRST, before considering others)
2. Stability (over time) - test-retest different times, stability of test scores over time, correlation = stability. However, practice effect, change in ability, also which one is it
3. Equivalence (alternate, parallel forms) - correlation interpreted as equivalence of two tests, problem of ordering effect solved with counterbalanced design, examines equivalence of scores obtained from alternate forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which internal consistency reliability estimates are based on correlation, which are based on proportion of variance?

A

Based on correlation: Spearman-Brown Split-half

Based on proportion of variance: Guttman, KR-20, Cronbach’s alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly