Reliability and Validity Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Assumes that each person has a true score that would be obtained if there were no errors in measurement.

A

Classical Test Score Theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

assumes that the items that have been selected for any one test are just a sample of items from an infinite domain of potential items

A

Domain Sampling Theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

the process of choosing test items that are appropriate to the content domain of the test

A

domain sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

this model considers the problems created by using a limited number of items to represent a larger and more complicated construct

A

the domain sampling model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

using this, the computer is used to focus on the range of item difficulty that helps assess an individual’s ability level.

A

item response theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

refers to the degree which scores from a test are stable and results are consistent

A

reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

test reliability is usually estimated in one of three ways

A
  • test-retest method
  • method of parallel forms
  • method of internal consistency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

in the ____, we consider the consistency of the test results when the test is administered on different occasions

A

test-retest method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

using the ____, we evaluate the test across different forms of the test

A

method of parallel forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

we examine how people perform on similar subsets of items selected from the same form of the measure with the _____

A

method of internal consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

this effect occurs when the first testing session influences scores from second session

A

carryover effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

compares two equivalent forms of a test that measure the same attribute

A

parallel forms / equivalent forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  • a test given and divided into halves that are scored separately
  • the results of one half of the test are then compared with the results of the other
A

split half method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Source of Error and Method for:
same test given at two points in time

A

Source of Error: Time sampling
Method: Test-Retest Method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

correlation between scores obtained on the two occasions
* source of error
* method

A

Source of Error: Time Sampling

Method: Test-Retest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

different items used to assess the same attribute

A

item sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

different items used to assess the same attribute

A

item sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

correlation between equivalent forms of the test that have different items

A

item sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

determined by dividing the total set of items relating to a construct of interest into halves and comparing the results obtained from the two subsets of items thus created

A

split half reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

also known as cronbach’s alpha

A

coefficient alpha

21
Q
  • a measure of internal consistency, that is, how closely related a set of items are as a group
  • it is considered to be a measure of scale reliability
A

coefficient alpha

22
Q

used to estimate the reliability of binary measurements

A

KR20 (Kuder and Richardson Formula 20)

23
Q
  • takes into account chance agreement
  • defined as (observed agreement - expected agreement)/(1-expected agreement)
A

Kappa Statistics

24
Q

best method for assessing the level of agreement among several observers

A

kappa statistics

25
Q

value of kappa when two measurements agree only at the chance level

A

0

26
Q

value of kappa when two measurements agree perfectly

A

1.0

27
Q

range in which reliability estimates are good enough for most purposes in basic research

A

.70 and .80

28
Q

what to do about low reliability?

A

increase the number of items according to the domain sampling model

29
Q

the ____ the sample, the more likely that the test will represent the true characteristic

A

larger

30
Q
  • can be applied to correct for half-length
  • allows to estimate what the correlation between the two halves would have been if each half had been the length of the whole test
A

spearman-brown formula

31
Q

can be defined as the agreement between a test score or measure and the quality it is believed to measure

A

validity

32
Q

answers the question, “does the test measure what is supposed to measure?”

A

validity

33
Q

3 types of evidence in validity

A
  1. construct-related
  2. criterion-related
  3. content-related
34
Q

is the mere appearance that a measure has validity

A

face validity

35
Q

the only type of evidence besides face validity that is logical rather than statistical

A

content validity

36
Q

describes the failure to capture important components of a construct

A

construct underrepresentation

37
Q

occurs when scores are influenced by factors irrelevant to the construct

A

construct irrelevant variance

38
Q

tells us just how well a test corresponds with a particular criterion

A

criterion validity evidence

39
Q

standard against which the test is compared

A

criterion

40
Q

forecasting function of tests is usually a type or form of criterion validity evidence known as ______

A

predictive validity evidence

41
Q

the relationship between a test and a criterion is usually expressed as a correlation called _________ __________

A

validity coefficient

42
Q

established through a series of activities in which a researcher simultaneously defines some construct and develops the instrumentation to measure it

A

construct validity evidence

43
Q

involves assembling evidence about what a test means

A

construct validation

44
Q

when a measure correlates well with other tests believed to measure the same construct, ___________ ___________ for validity is obtained

A

convergent evidence

45
Q
  • also called divergent validation
  • demonstration of uniqueness
  • to demonstrate for validity, a test should have low correlations with measures of unrelated constructs, or evidence for what the test does not measure
A

discriminant evidence

46
Q

refers to the standardized test that are designed to compare and rank test takers in relation to one another

A

norm-referenced test

47
Q

the process of evaluating (or grading) the learning of students against a set of pre-specified qualities or criteria, without the reference to the achievement of others

A

criterion-referenced test

48
Q

indicates that the measure does not represent a construct other than the one for which it was derived.

A

discriminant evidence

49
Q

simple guidelines for item writing

A
  • define clearly what you want to measure
  • generate an item pool
  • avoid** exceptionally long items**
  • keep the level of reading difficulty appropriate for those who will complete the scale
  • avoid double-barreled items that convey two or more ideas at the same time
  • consider mixing positively and negatively worded items