Unit 5 (from quizlet) Flashcards by Ashe xxx

based on the idea that a person’s test scores vary from testing to testing because of variables in the testing situation. Given the exact conditions of all the facets in the universe, the exact same test score should be obtained.

Generalizability theory

How well did you know this?

Not at all

Perfectly

an estimate of reliability obtained by correlating pairs of scores from the sample people on two different administrations of the same test

Test-retest reliability

How well did you know this?

Not at all

Perfectly

The degree of correlation among all items on a scale.

Calculated from a single administration of a single form of a test - useful in assessing homogeneity of the test.

Inter-item consistency

How well did you know this?

Not at all

Perfectly

What are IRT models?

Rasch model, dichotomous test items, and polytomous test items

How well did you know this?

Not at all

Perfectly

a trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experiences

Dynamic characteristic

How well did you know this?

Not at all

Perfectly

The relationship between SEM and the reliability of the test is […];

inverse

the higher the reliability of the test, the lower the SEM

How well did you know this?

Not at all

Perfectly

A study examining how generalizable scores from a particular test are if the test is administered in different situations. It examines how much of an impact different facets of the universe have on the test score.

Generalizability study

How well did you know this?

Not at all

Perfectly

potential sources of error variance.

The examiner’s physical appearance and demeanor are some factors for consideration here. On an oral examination, some examiners may unwittingly provide clues by emphasizing key words as they pose questions.

Examiner-related variables

How well did you know this?

Not at all

Perfectly

a value that according to CTT, genuinely reflects an individual’s ability (or trait) level as measured by a particular test

True score

How well did you know this?

Not at all

Perfectly

Sources of error variance

test construction,
administration,
scoring, and/or interpretation

How well did you know this?

Not at all

Perfectly

interviews may not have been trained properly, the wording amy have been ambiguous, or the items may have somehow been biased.

methodological error

How well did you know this?

Not at all

Perfectly

pressing emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medications. A test taker may make a mistake in entering a test response

Testtaker variables

How well did you know this?

Not at all

Perfectly

What does error refer to?

The component of the observed test score that does not have to do with the testtaker’s ability.

How well did you know this?

Not at all

Perfectly

all of the factors associated with the process of measuring some variable, other than the variable being measured

Measurement error

How well did you know this?

Not at all

Perfectly

if then variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be lower

Restriction of range

How well did you know this?

Not at all

Perfectly

Could also be used to determine the number of items needed to attain a desired level of reliability

Spearman-Brown formula

How well did you know this?

Not at all

Perfectly

assign odd-numbered items to one half of the test and even-numbered items to the other half.

Odd-even reliability

How well did you know this?

Not at all

Perfectly

a test containing items of uniform level of difficulty so that, when given generous time limits, all test takers should be able to complete all the test items correctly

Speed test

How well did you know this?

Not at all

Perfectly

What problems with CTT?

all items are presumed to be contributing equally to the score total.
CTT favors the development of longer rather than shorter tests

How well did you know this?

Not at all

Perfectly

It is a useful measure of reliability when it is impractical or undesirable to assess reliability with two tests or to administer a test twice.

Split-half reliability

How well did you know this?

Not at all

Perfectly

an index of reliability, a proportion that indicates the ratio between the true score variance and the total variance.

Reliability coefficient

How well did you know this?

Not at all

Perfectly

What does the Spearman-Brown formula allow?

a test developer or user to estimate internal consistency reliability from a correlation of two halves of a test.

How well did you know this?

Not at all

Perfectly

a source of error measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.

Systematic error

How well did you know this?

Not at all

Perfectly

formula for error

X=T+E

(x=observed score, T represents true score, E represents error)

How well did you know this?

Not at all

Perfectly

A reliability estimate of a speed test should be based on performance from two independent testing periods using what?

1) test-retest reliability 2) alternate-forms reliability 3) split-half reliability from two separately timed half test. * if a split-half procedure is used, the obtained reliability coefficient is for half test and should be adjusted using the Spearman-Brown formula

it provides a measure of the precision of an observed test score. It provides an estimate of the amount of error inherent in an observed score or measurement

Standard error of measurement (SEM)

variance from true differences

True variance

- if two scores each contain error such that in each case the true score could be higher or lower, then we would want the two scores to be further apart before we conclude that there is a significant difference between them.

Standard error of the difference

A study in which developers examine the usefulness of test scores in helping the test user make decisions

Decision study

a range or band of test scores that is likely to contain the true score

Confidence interval

The probability of endorsing or selecting an item response indicative of higher levels of theta should increase as the underlying level of theta increases

Monotonicity

- typically designed to be equivalent with respect to variances such as content and level of difficulty

Alternate forms reliability

Why is CTT then most widely used model of measurement?

Because of simplicity, especially when one considers the complexity of other proposed models of measurement. Assumptions are rather easily met and therefore applicable to so many measurement situations

It signifies the degree to which an item differentiates among people with higher or lower levels of the trait, ability, or whatever it is that is being measured

Discrimination

an estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means of variances of observed test scores are equal.

Parallel form reliability

the degree of agreement or consistency between two or more scorers with regard to a particular measure

Inter-scorer reliability

The proportion of the total variance attributed to true variance. The greater the proportion of total variance attributed to true variance, the more reliable the test.

Reliability

What is a useful feature of IRT?

It enables test users to better understand the range over theta for which an item is most useful in discriminating among groups of test takers.

It is a specific application of a more general formula to estimate the reliability of a test that is lengthened or shortened by a number of items.

Spearman-Brown formula

a test designed to provide an indication of where a test taker stands with respect to some variable or criterion, such as an educational or vocational objective. * should contain material that has been mastered in a hierarchal fashion.

Criterion referenced test

a test's reliability is conceived of as an objective measure of how precisely the score assesses the domain from which the test draws a sample

Domain sampling theory

intelligence-obtained measurement would not be expected to vary significantly as a function of time, and either the test-retest or alternate forms method would be appropriate

Status characteristic

What is the Kuder-Richardsom formula 20

the statistic of choice for determine the inter-item consistency of dichotomous items, primarily those items that can be scored right or wrong. * if the test is heterogenous, KR-20 will yield lower reliability estimates than the split-half method

A reliability estimate is based on the correlation between scores on two halves of the test and is then adjusted using the Spearman-Brown formula

Split-half reliability

a test containing items that measure a single trait.

Homogeneous test

- appropriate when evaluating the reliability of a test that purports to measure something that is relatively stable over time

Test-retest reliability

What are the three approaches to the estimation of reliability?

1) test-retest 2) alternate or parallel forms 3) internal or inter-item consistency

the degree of the relationship between various forms of a test can be evaluated by means of an alternate-forms or parallel forms.

Coefficient of equivalence

an estimate of the extent to which these different forms of the same test have been affected items sampling, or other error.

Alternate forms reliability

the procedures of this theory provide a way to model the probability that person with X ability will be able to perform at a level of Y.

Item response theory (IRT) (latent-trait theory)

A statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant.

Standard error of the difference

What is local independence?

means that a) there is a systematic relationship between all of the test items and b) that relationship has to do with theta level of the testtaker

the preferred statistic for obtaining an estimate of internal consistency reliability

Cronbach's alpha

the extent to which the population of voters in the study actually was representative of voters in the election. Researchers may have gotten factors right, but did not include enough people in their sample to draw the conclusion that they did.

Sampling error

if the variance of either variable in a correlational analysis is inflated by the sampling procedure, then the resulting correlation coefficient tends to be higher

Inflation of range

correlating pairs of scores obtained from equivalent halves of a single test administered once.

Split-half reliability

What is the Rasch model?

a reference to an IRT model with very specific assumptions about the underlying distribution. (each item is assumed to have an equivalent relationship with the construct being measured by the test)

a measure that focuses on the degree of difference that exists between item scores. - The APD index is not contacted to the number of items on a measure - the APD index is not connected to the number of items on a measure

Average proportional distance

the true score (or classical) model of measurement. It is the most widely used and accepted model in the psychometric literature today.

Classical test theory (CTT)

a test where some items are so difficult that no test taker is able to obtain a perfect score

Power test

It represents the influence of particular facets on the test score

Coefficient of generalizability

a test composed of items that measure more than one trait

Heterogenous test

- a reliability estimate is based on the correlation between the two total scores on the two forms

Alternate forms reliability

the simplest way to determine the degree of consistency among scorers in the scoring of a test

Coefficient of inter-scorer reliability

variance from irrelevant, random sources

Error variance

the tool used to estimate or infer the extent to which an observed score deviates from a true score

standard error of measurement (SEM)

test items or questions that can be answered with only one of two alternative responses

Dichotomous test items

test items or questions with three or more alternative responses, where only one is scored correct or scores as being consistent with a targeted trait or other construct

Polytomous test items

What are the assumptions of using IRT?

1) unidemensionality - points that the set of items measures a single continuous latent construct. This construct is referred to as theta. Theta is a reference to the degree of the underlying ability or trait the test taker is presumed to bring to the test 2) local independence 3) monotonicity

What is the difference between CTT and domain sampling theory?

CTT - seek to estimate the portion of a test score that is attributable to error Domain Sampling Theory - seek to estimate the extent to which specific sources of variation under defined conditions are contributing to the test score

When the interval between testing is greater than six months, it is the estimate of test-retest reliability

Coefficient of stability

a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process. ex: noise

Random error

Unit 5 (from quizlet) Flashcards

(72 cards)