Unit 5 (from quizlet) Flashcards

1
Q

based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation. Given the exact conditions of all the facets in the universe, the exact same test score should be obtained.

A

Generalizability theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

an estimate of reliability obtained by correlating pairs of scores from the sample people on two different administrations of the same test

A

Test-retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The degree of correlation among all items on a scale.

Calculated from a single administration of a single form of a test - useful in assessing homogeneity of the test.

A

Inter-item consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are IRT models?

A

Rasch model, dichotomous test items, and polytomous test items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences

A

Dynamic characteristic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The relationship between SEM and the reliability of the test is […];

A

inverse

the higher the reliability of the test, the lower the SEM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A study examining how generalizable scores from a particular test are if the test is administered in different situations. It examines how much of an impact different facets of the universe have on the test score.

A

Generalizability study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

potential sources of error variance.

The examiner’s physical appearance and demeanor are some factors for consideration here. On an oral examination, some examiners may unwittingly provide clues by emphasizing key words as they pose questions.

A

Examiner-related variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

a value that according to CTT, genuinely reflects an individual’s ability (or trait) level as measured by a particular test

A

True score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sources of error variance

A

test construction,
administration,
scoring, and/or interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

interviews may not have been trained properly, the wording amy have been ambiguous, or the items may have somehow been biased.

A

methodological error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medications. A test taker may make a mistake in entering a test response

A

Testtaker variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does error refer to?

A

The component of the observed test score that does not have to do with the testtaker’s ability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

all of the factors associated with the process of measuring some variable, other than the variable being measured

A

Measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

if then variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be lower

A

Restriction of range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Could also be used to determine the number of items needed to attain a desired level of reliability

A

Spearman-Brown formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

assign odd-numbered items to one half of the test and even-numbered items to the other half.

A

Odd-even reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

a test containing items of uniform level of difficulty so that, when given generous time limits, all test takers should be able to complete all the test items correctly

A

Speed test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What problems with CTT?

A
  • all items are presumed to be contributing equally to the score total.
  • CTT favors the development of longer rather than shorter tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

It is a useful measure of reliability when it is impractical or undesirable to assess reliability with two tests or to administer a test twice.

A

Split-half reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

an index of reliability, a proportion that indicates the ratio between the true score variance and the total variance.

A

Reliability coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does the Spearman-Brown formula allow?

A

a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

a source of error measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.

A

Systematic error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

formula for error

A

X=T+E

(x=observed score, T represents true score, E represents error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
A reliability estimate of a speed test should be based on performance from two independent testing periods using what?
1) test-retest reliability 2) alternate-forms reliability 3) split-half reliability from two separately timed half test. * if a split-half procedure is used, the obtained reliability coefficient is for half test and should be adjusted using the Spearman-Brown formula
26
it provides a measure of the precision of an observed test score. It provides an estimate of the amount of error inherent in an observed score or measurement
Standard error of measurement (SEM)
27
variance from true differences
True variance
28
- if two scores each contain error such that in each case the true score could be higher or lower, then we would want the two scores to be further apart before we conclude that there is a significant difference between them.
Standard error of the difference
29
A study in which developers examine the usefulness of test scores in helping the test user make decisions
Decision study
30
a range or band of test scores that is likely to contain the true score
Confidence interval
31
The probability of endorsing or selecting an item response indicative of higher levels of theta should increase as the underlying level of theta increases
Monotonicity
32
- typically designed to be equivalent with respect to variances such as content and level of difficulty
Alternate forms reliability
33
Why is CTT then most widely used model of measurement?
Because of simplicity, especially when one considers the complexity of other proposed models of measurement. Assumptions are rather easily met and therefore applicable to so many measurement situations
34
It signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured
Discrimination
35
an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means of variances of observed test scores are equal.
Parallel form reliability
36
the degree of agreement or consistency between two or more scorers with regard to a particular measure
Inter-scorer reliability
37
The proportion of the total variance attributed to true variance. The greater the proportion of total variance attributed to true variance, the more reliable the test.
Reliability
38
What is a useful feature of IRT?
It enables test users to better understand the range over theta for which an item is most useful in discriminating among groups of test takers.
39
It is a specific application of a more general formula to estimate the reliability of a test that is lengthened or shortened by a number of items.
Spearman-Brown formula
40
a test designed to provide an indication of where a test taker stands with respect to some variable or criterion, such as an educational or vocational objective. * should contain material that has been mastered in a hierarchal fashion.
Criterion referenced test
41
a test's reliability is conceived of as an objective measure of how precisely the score assesses the domain from which the test draws a sample
Domain sampling theory
42
intelligence-obtained measurement would not be expected to vary significantly as a function of time, and either the test-retest or alternate forms method would be appropriate
Status characteristic
43
What is the Kuder-Richardsom formula 20
the statistic of choice for determine the inter-item consistency of dichotomous items, primarily those items that can be scored right or wrong. * if the test is heterogenous, KR-20 will yield lower reliability estimates than the split-half method
44
A reliability estimate is based on the correlation between scores on two halves of the test and is then adjusted using the Spearman-Brown formula
Split-half reliability
45
a test containing items that measure a single trait.
Homogeneous test
46
- appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time
Test-retest reliability
47
What are the three approaches to the estimation of reliability?
1) test-retest 2) alternate or parallel forms 3) internal or inter-item consistency
48
the degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel forms.
Coefficient of equivalence
49
an estimate of the extent to which these different forms of the same test have been affected items sampling, or other error.
Alternate forms reliability
50
the procedures of this theory provide a way to model the probability that person with X ability will be able to perform at a level of Y.
Item response theory (IRT) (latent-trait theory)
51
A statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant.
Standard error of the difference
52
What is local independence?
means that a) there is a systematic relationship between all of the test items and b) that relationship has to do with theta level of the testtaker
53
the preferred statistic for obtaining an estimate of internal consistency reliability
Cronbach's alpha
54
the extent to which the population of voters in the study actually was representative of voters in the election. Researchers may have gotten factors right, but did not include enough people in their sample to draw the conclusion that they did.
Sampling error
55
if the variance of either variable in a correlational analysis is inflated by the sampling procedure, then the resulting correlation coefficient tends to be higher
Inflation of range
56
correlating pairs of scores obtained from equivalent halves of a single test administered once.
Split-half reliability
57
What is the Rasch model?
a reference to an IRT model with very specific assumptions about the underlying distribution. (each item is assumed to have an equivalent relationship with the construct being measured by the test)
58
a measure that focuses on the degree of difference that exists between item scores. - The APD index is not contacted to the number of items on a measure - the APD index is not connected to the number of items on a measure
Average proportional distance
59
the true score (or classical) model of measurement. It is the most widely used and accepted model in the psychometric literature today.
Classical test theory (CTT)
60
a test where some items are so difficult that no test taker is able to obtain a perfect score
Power test
61
It represents the influence of particular facets on the test score
Coefficient of generalizability
62
a test composed of items that measure more than one trait
Heterogenous test
63
- a reliability estimate is based on the correlation between the two total scores on the two forms
Alternate forms reliability
64
the simplest way to determine the degree of consistency among scorers in the scoring of a test
Coefficient of inter-scorer reliability
65
variance from irrelevant, random sources
Error variance
66
the tool used to estimate or infer the extent to which an observed score deviates from a true score
standard error of measurement (SEM)
67
test items or questions that can be answered with only one of two alternative responses
Dichotomous test items
68
test items or questions with three or more alternative responses, where only one is scored correct or scores as being consistent with a targeted trait or other construct
Polytomous test items
69
What are the assumptions of using IRT?
1) unidemensionality - points that the set of items measures a single continuous latent construct. This construct is referred to as theta. Theta is a reference to the degree of the underlying ability or trait the test taker is presumed to bring to the test 2) local independence 3) monotonicity
70
What is the difference between CTT and domain sampling theory?
CTT - seek to estimate the portion of a test score that is attributable to error Domain Sampling Theory - seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score
71
When the interval between testing is greater than six months, it is the estimate of test-retest reliability
Coefficient of stability
72
a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process. ex: noise
Random error