Reliability and Validity Flashcards by Charlene Namoc

Assumes that each person has a true score that would be obtained if there were no errors in measurement.

Classical Test Score Theory

How well did you know this?

Not at all

Perfectly

assumes that the items that have been selected for any one test are just a sample of items from an infinite domain of potential items

Domain Sampling Theory

How well did you know this?

Not at all

Perfectly

the process of choosing test items that are appropriate to the content domain of the test

domain sampling

How well did you know this?

Not at all

Perfectly

this model considers the problems created by using a limited number of items to represent a larger and more complicated construct

the domain sampling model

How well did you know this?

Not at all

Perfectly

using this, the computer is used to focus on the range of item difficulty that helps assess an individual’s ability level.

item response theory

How well did you know this?

Not at all

Perfectly

refers to the degree which scores from a test are stable and results are consistent

reliability

How well did you know this?

Not at all

Perfectly

test reliability is usually estimated in one of three ways

test-retest method
method of parallel forms
method of internal consistency

How well did you know this?

Not at all

Perfectly

in the ____, we consider the consistency of the test results when the test is administered on different occasions

test-retest method

How well did you know this?

Not at all

Perfectly

using the ____, we evaluate the test across different forms of the test

method of parallel forms

How well did you know this?

Not at all

Perfectly

we examine how people perform on similar subsets of items selected from the same form of the measure with the _____

method of internal consistency

How well did you know this?

Not at all

Perfectly

this effect occurs when the first testing session influences scores from second session

carryover effect

How well did you know this?

Not at all

Perfectly

compares two equivalent forms of a test that measure the same attribute

parallel forms / equivalent forms reliability

How well did you know this?

Not at all

Perfectly

a test given and divided into halves that are scored separately
the results of one half of the test are then compared with the results of the other

split half method

How well did you know this?

Not at all

Perfectly

Source of Error and Method for:
same test given at two points in time

Source of Error: Time sampling
Method: Test-Retest Method

How well did you know this?

Not at all

Perfectly

correlation between scores obtained on the two occasions
* source of error
* method

Source of Error: Time Sampling

Method: Test-Retest

How well did you know this?

Not at all

Perfectly

different items used to assess the same attribute

item sampling

How well did you know this?

Not at all

Perfectly

different items used to assess the same attribute

item sampling

How well did you know this?

Not at all

Perfectly

correlation between equivalent forms of the test that have different items

item sampling

How well did you know this?

Not at all

Perfectly

determined by dividing the total set of items relating to a construct of interest into halves and comparing the results obtained from the two subsets of items thus created

split half reliability

How well did you know this?

Not at all

Perfectly

also known as cronbach’s alpha

Study These Flashcards

coefficient alpha

a measure of internal consistency, that is, how closely related a set of items are as a group
it is considered to be a measure of scale reliability

Study These Flashcards

coefficient alpha

used to estimate the reliability of binary measurements

Study These Flashcards

KR20 (Kuder and Richardson Formula 20)

takes into account chance agreement
defined as (observed agreement - expected agreement)/(1-expected agreement)

Study These Flashcards

Kappa Statistics

best method for assessing the level of agreement among several observers

Study These Flashcards

kappa statistics

value of kappa when two measurements agree **only at the chance level**

value of kappa when two measurements agree **perfectly**

1.0

range in which reliability estimates are **good enough** for most purposes in basic research

.70 and .80

what to do about low reliability?

increase the number of items according to the domain sampling model

the ____ the sample, the more likely that the test will represent the true characteristic

larger

- can be applied **to correct for half-length** - allows to estimate what the correlation between the two halves would have been if each half had been the length of the whole test

spearman-brown formula

can be defined as the **agreement between a test score** or measure **and the quality it is believed to measure**

validity

answers the question, **"does the test measure what is supposed to measure?"**

validity

3 types of evidence in validity

1. construct-related 2. criterion-related 3. content-related

is the **mere appearance** that a measure has validity

face validity

the only type of evidence besides face validity that is **logical rather than statistical**

content validity

describes the **failure to capture** important components of a **construct**

construct underrepresentation

occurs when scores are influenced by factors **irrelevant** to the construct

construct irrelevant variance

tells us just **how well a test corresponds with a particular criterion**

criterion validity evidence

**standard** against which the test is compared

criterion

**forecasting function of tests** is usually a type or form of criterion validity evidence known as ______

predictive validity evidence

the **relationship between a test and a criterion** is usually expressed as a correlation called _________ __________

validity coefficient

established through a series of activities in which a researcher **simultaneously defines some construct and develops the instrumentation to measure it**

construct validity evidence

involves **assembling** evidence about what a test means

construct validation

when a measure **correlates well** with other tests believed to measure the same construct, ___________ ___________ for validity is obtained

convergent evidence

- also called divergent validation - demonstration of uniqueness - to demonstrate for validity, a test should have **low correlations with measures of unrelated constructs**, or evidence for what the test does not measure

discriminant evidence

refers to the standardized test that are designed to **compare and rank test takers** in relation to one another

norm-referenced test

the process of evaluating (or grading) the learning of students against a set of pre-specified qualities or criteria, **without the reference to the achievement of others**

criterion-referenced test

indicates that the measure **does not represent a construct** other than the one for which it was derived.

discriminant evidence

simple guidelines for item writing

- **define clearly** what you want to measure - **generate an item pool** - avoid** exceptionally long items** - keep the level of **reading difficulty appropriate for those who will complete the scale** - **avoid double-barreled items** that convey two or more ideas at the same time - **consider mixing positively and negatively worded items**

Reliability and Validity Flashcards

(49 cards)