Reliability and Validity Flashcards

1
Q

Assumes that each person has a true score that would be obtained if there were no errors in measurement.

A

Classical Test Score Theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

assumes that the items that have been selected for any one test are just a sample of items from an infinite domain of potential items

A

Domain Sampling Theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

the process of choosing test items that are appropriate to the content domain of the test

A

domain sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

this model considers the problems created by using a limited number of items to represent a larger and more complicated construct

A

the domain sampling model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

using this, the computer is used to focus on the range of item difficulty that helps assess an individual’s ability level.

A

item response theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

refers to the degree which scores from a test are stable and results are consistent

A

reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

test reliability is usually estimated in one of three ways

A
  • test-retest method
  • method of parallel forms
  • method of internal consistency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

in the ____, we consider the consistency of the test results when the test is administered on different occasions

A

test-retest method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

using the ____, we evaluate the test across different forms of the test

A

method of parallel forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

we examine how people perform on similar subsets of items selected from the same form of the measure with the _____

A

method of internal consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

this effect occurs when the first testing session influences scores from second session

A

carryover effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

compares two equivalent forms of a test that measure the same attribute

A

parallel forms / equivalent forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  • a test given and divided into halves that are scored separately
  • the results of one half of the test are then compared with the results of the other
A

split half method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Source of Error and Method for:
same test given at two points in time

A

Source of Error: Time sampling
Method: Test-Retest Method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

correlation between scores obtained on the two occasions
* source of error
* method

A

Source of Error: Time Sampling

Method: Test-Retest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

different items used to assess the same attribute

A

item sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

different items used to assess the same attribute

A

item sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

correlation between equivalent forms of the test that have different items

A

item sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

determined by dividing the total set of items relating to a construct of interest into halves and comparing the results obtained from the two subsets of items thus created

A

split half reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

also known as cronbach’s alpha

A

coefficient alpha

21
Q
  • a measure of internal consistency, that is, how closely related a set of items are as a group
  • it is considered to be a measure of scale reliability
A

coefficient alpha

22
Q

used to estimate the reliability of binary measurements

A

KR20 (Kuder and Richardson Formula 20)

23
Q
  • takes into account chance agreement
  • defined as (observed agreement - expected agreement)/(1-expected agreement)
A

Kappa Statistics

24
Q

best method for assessing the level of agreement among several observers

A

kappa statistics

25
value of kappa when two measurements agree **only at the chance level**
0
26
value of kappa when two measurements agree **perfectly**
1.0
27
range in which reliability estimates are **good enough** for most purposes in basic research
.70 and .80
28
what to do about low reliability?
increase the number of items according to the domain sampling model
29
the ____ the sample, the more likely that the test will represent the true characteristic
larger
30
- can be applied **to correct for half-length** - allows to estimate what the correlation between the two halves would have been if each half had been the length of the whole test
spearman-brown formula
31
can be defined as the **agreement between a test score** or measure **and the quality it is believed to measure**
validity
32
answers the question, **"does the test measure what is supposed to measure?"**
validity
33
3 types of evidence in validity
1. construct-related 2. criterion-related 3. content-related
34
is the **mere appearance** that a measure has validity
face validity
35
the only type of evidence besides face validity that is **logical rather than statistical**
content validity
36
describes the **failure to capture** important components of a **construct**
construct underrepresentation
37
occurs when scores are influenced by factors **irrelevant** to the construct
construct irrelevant variance
38
tells us just **how well a test corresponds with a particular criterion**
criterion validity evidence
39
**standard** against which the test is compared
criterion
40
**forecasting function of tests** is usually a type or form of criterion validity evidence known as ______
predictive validity evidence
41
the **relationship between a test and a criterion** is usually expressed as a correlation called _________ __________
validity coefficient
42
established through a series of activities in which a researcher **simultaneously defines some construct and develops the instrumentation to measure it**
construct validity evidence
43
involves **assembling** evidence about what a test means
construct validation
44
when a measure **correlates well** with other tests believed to measure the same construct, ___________ ___________ for validity is obtained
convergent evidence
45
- also called divergent validation - demonstration of uniqueness - to demonstrate for validity, a test should have **low correlations with measures of unrelated constructs**, or evidence for what the test does not measure
discriminant evidence
46
refers to the standardized test that are designed to **compare and rank test takers** in relation to one another
norm-referenced test
47
the process of evaluating (or grading) the learning of students against a set of pre-specified qualities or criteria, **without the reference to the achievement of others**
criterion-referenced test
48
indicates that the measure **does not represent a construct** other than the one for which it was derived.
discriminant evidence
49
simple guidelines for item writing
- **define clearly** what you want to measure - **generate an item pool** - avoid** exceptionally long items** - keep the level of **reading difficulty appropriate for those who will complete the scale** - **avoid double-barreled items** that convey two or more ideas at the same time - **consider mixing positively and negatively worded items**