Unit 5 (from quizlet) Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation. Given the exact conditions of all the facets in the universe, the exact same test score should be obtained.

A

Generalizability theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

an estimate of reliability obtained by correlating pairs of scores from the sample people on two different administrations of the same test

A

Test-retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The degree of correlation among all items on a scale.

Calculated from a single administration of a single form of a test - useful in assessing homogeneity of the test.

A

Inter-item consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are IRT models?

A

Rasch model, dichotomous test items, and polytomous test items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences

A

Dynamic characteristic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The relationship between SEM and the reliability of the test is […];

A

inverse

the higher the reliability of the test, the lower the SEM

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A study examining how generalizable scores from a particular test are if the test is administered in different situations. It examines how much of an impact different facets of the universe have on the test score.

A

Generalizability study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

potential sources of error variance.

The examiner’s physical appearance and demeanor are some factors for consideration here. On an oral examination, some examiners may unwittingly provide clues by emphasizing key words as they pose questions.

A

Examiner-related variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

a value that according to CTT, genuinely reflects an individual’s ability (or trait) level as measured by a particular test

A

True score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Sources of error variance

A

test construction,
administration,
scoring, and/or interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

interviews may not have been trained properly, the wording amy have been ambiguous, or the items may have somehow been biased.

A

methodological error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medications. A test taker may make a mistake in entering a test response

A

Testtaker variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does error refer to?

A

The component of the observed test score that does not have to do with the testtaker’s ability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

all of the factors associated with the process of measuring some variable, other than the variable being measured

A

Measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

if then variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be lower

A

Restriction of range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Could also be used to determine the number of items needed to attain a desired level of reliability

A

Spearman-Brown formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

assign odd-numbered items to one half of the test and even-numbered items to the other half.

A

Odd-even reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

a test containing items of uniform level of difficulty so that, when given generous time limits, all test takers should be able to complete all the test items correctly

A

Speed test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What problems with CTT?

A
  • all items are presumed to be contributing equally to the score total.
  • CTT favors the development of longer rather than shorter tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

It is a useful measure of reliability when it is impractical or undesirable to assess reliability with two tests or to administer a test twice.

A

Split-half reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

an index of reliability, a proportion that indicates the ratio between the true score variance and the total variance.

A

Reliability coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does the Spearman-Brown formula allow?

A

a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

a source of error measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.

A

Systematic error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

formula for error

A

X=T+E

(x=observed score, T represents true score, E represents error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

A reliability estimate of a speed test should be based on performance from two independent testing periods using what?

A

1) test-retest reliability
2) alternate-forms reliability
3) split-half reliability from two separately timed half test.

  • if a split-half procedure is used, the obtained reliability coefficient is for half test and should be adjusted using the Spearman-Brown formula
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

it provides a measure of the precision of an observed test score. It provides an estimate of the amount of error inherent in an observed score or measurement

A

Standard error of measurement (SEM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

variance from true differences

A

True variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q
  • if two scores each contain error such that in each case the true score could be higher or lower, then we would want the two scores to be further apart before we conclude that there is a significant difference between them.
A

Standard error of the difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

A study in which developers examine the usefulness of test scores in helping the test user make decisions

A

Decision study

30
Q

a range or band of test scores that is likely to contain the true score

A

Confidence interval

31
Q

The probability of endorsing or selecting an item response indicative of higher levels of theta should increase as the underlying level of theta increases

A

Monotonicity

32
Q
  • typically designed to be equivalent with respect to variances such as content and level of difficulty
A

Alternate forms reliability

33
Q

Why is CTT then most widely used model of measurement?

A

Because of simplicity, especially when one considers the complexity of other proposed models of measurement. Assumptions are rather easily met and therefore applicable to so many measurement situations

34
Q

It signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured

A

Discrimination

35
Q

an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means of variances of observed test scores are equal.

A

Parallel form reliability

36
Q

the degree of agreement or consistency between two or more scorers with regard to a particular measure

A

Inter-scorer reliability

37
Q

The proportion of the total variance attributed to true variance.

The greater the proportion of total variance attributed to true variance, the more reliable the test.

A

Reliability

38
Q

What is a useful feature of IRT?

A

It enables test users to better understand the range over theta for which an item is most useful in discriminating among groups of test takers.

39
Q

It is a specific application of a more general formula to estimate the reliability of a test that is lengthened or shortened by a number of items.

A

Spearman-Brown formula

40
Q

a test designed to provide an indication of where a test taker stands with respect to some variable or criterion, such as an educational or vocational objective.

  • should contain material that has been mastered in a hierarchal fashion.
A

Criterion referenced test

41
Q

a test’s reliability is conceived of as an objective measure of how precisely the score assesses the domain from which the test draws a sample

A

Domain sampling theory

42
Q

intelligence-obtained measurement would not be expected to vary significantly as a function of time, and either the test-retest or alternate forms method would be appropriate

A

Status characteristic

43
Q

What is the Kuder-Richardsom formula 20

A

the statistic of choice for determine the inter-item consistency of dichotomous items, primarily those items that can be scored right or wrong.

  • if the test is heterogenous, KR-20 will yield lower reliability estimates than the split-half method
44
Q

A reliability estimate is based on the correlation between scores on two halves of the test and is then adjusted using the Spearman-Brown formula

A

Split-half reliability

45
Q

a test containing items that measure a single trait.

A

Homogeneous test

46
Q
  • appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time
A

Test-retest reliability

47
Q

What are the three approaches to the estimation of reliability?

A

1) test-retest
2) alternate or parallel forms
3) internal or inter-item consistency

48
Q

the degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel forms.

A

Coefficient of equivalence

49
Q

an estimate of the extent to which these different forms of the same test have been affected items sampling, or other error.

A

Alternate forms reliability

50
Q

the procedures of this theory provide a way to model the probability that person with X ability will be able to perform at a level of Y.

A

Item response theory (IRT) (latent-trait theory)

51
Q

A statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant.

A

Standard error of the difference

52
Q

What is local independence?

A

means that a) there is a systematic relationship between all of the test items and b) that relationship has to do with theta level of the testtaker

53
Q

the preferred statistic for obtaining an estimate of internal consistency reliability

A

Cronbach’s alpha

54
Q

the extent to which the population of voters in the study actually was representative of voters in the election.

Researchers may have gotten factors right, but did not include enough people in their sample to draw the conclusion that they did.

A

Sampling error

55
Q

if the variance of either variable in a correlational analysis is inflated by the sampling procedure, then the resulting correlation coefficient tends to be higher

A

Inflation of range

56
Q

correlating pairs of scores obtained from equivalent halves of a single test administered once.

A

Split-half reliability

57
Q

What is the Rasch model?

A

a reference to an IRT model with very specific assumptions about the underlying distribution.

(each item is assumed to have an equivalent relationship with the construct being measured by the test)

58
Q

a measure that focuses on the degree of difference that exists between item scores.

  • The APD index is not contacted to the number of items on a measure
  • the APD index is not connected to the number of items on a measure
A

Average proportional distance

59
Q

the true score (or classical) model of measurement. It is the most widely used and accepted model in the psychometric literature today.

A

Classical test theory (CTT)

60
Q

a test where some items are so difficult that no test taker is able to obtain a perfect score

A

Power test

61
Q

It represents the influence of particular facets on the test score

A

Coefficient of generalizability

62
Q

a test composed of items that measure more than one trait

A

Heterogenous test

63
Q
  • a reliability estimate is based on the correlation between the two total scores on the two forms
A

Alternate forms reliability

64
Q

the simplest way to determine the degree of consistency among scorers in the scoring of a test

A

Coefficient of inter-scorer reliability

65
Q

variance from irrelevant, random sources

A

Error variance

66
Q

the tool used to estimate or infer the extent to which an observed score deviates from a true score

A

standard error of measurement (SEM)

67
Q

test items or questions that can be answered with only one of two alternative responses

A

Dichotomous test items

68
Q

test items or questions with three or more alternative responses, where only one is scored correct or scores as being consistent with a targeted trait or other construct

A

Polytomous test items

69
Q

What are the assumptions of using IRT?

A

1) unidemensionality - points that the set of items measures a single continuous latent construct. This construct is referred to as theta. Theta is a reference to the degree of the underlying ability or trait the test taker is presumed to bring to the test

2) local independence

3) monotonicity

70
Q

What is the difference between CTT and domain sampling theory?

A

CTT - seek to estimate the portion of a test score that is attributable to error

Domain Sampling Theory - seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score

71
Q

When the interval between testing is greater than six months, it is the estimate of test-retest reliability

A

Coefficient of stability

72
Q

a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process. ex: noise

A

Random error