Week 7: Reliability, Validity, & Utility Flashcards

1
Q

Find the magnitude of error and develop ways to minimize them

A

Presence of Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Tests that are relatively free from measurement error are deemed to be…

A

reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Less error =

A

High validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Error exists because we only obtain a sample of..

A

behavior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

who pioneered reliability assessment?

A

Charles Spearman

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

other pioneers

A

– De Moivre
– Pearson
– Kuder and Richardson
– Cronbach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

CTT: 𝑋 =

A

𝑇 + E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Measuring instruments are ____

A

imperfect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

observed score is almost always different from the ____ ability/characteristic

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

____ of measurement are random

A

Errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Because of random error, repeated application produces…

A

different results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Problem created by using a limited number of items to represent a larger, more complicated construct

A

Domain Sampling Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Task in reliability analysis is to estimate how much ______ is made by using a test score from the shorter test as estimate of the true ability

A

error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

the ratio of variance of the observed score on the shorter test and the variance of the long-run true score

A

Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Reliability can be estimated by ______ the observed score with the true score

A

correlating

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

T is not available so we estimate what ___

A

they would be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

To estimate reliability, we create many randomly _____

A

paralleled test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

focuses on an item difficulty to assess the ability

A

Item Response Theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Parelleled Tests are the same tests measuring…

A

the same concepts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Reliability is related to…

A

consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Reliability Coefficient

A

is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

is 0.7 an accepted coefficient?

A

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what coefficient cannot go beyond?

A

beyond 0.95

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is the critical coefficient?

A

0.6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are sources of error?

A
  • Test Construction
  • Test Administration
  • Test Scoring and I
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is under Test Construction?

A

Item sampling; content sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test

A

Test-Retest Reliability Estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

When the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to as the coefficient of…

A

stability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

exist when, for each form of the test, the means and the variances of observed test scores are equal

A

Parallel forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

simply different versions of a test that have been constructed so as to be parallel

A

Alternate Forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

coefficient of equivalence

A

Parallel-Forms and Alternate-Forms Reliability Estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once

A

Split-Half Reliability Estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the correct order for split-half?
a. Calculate a Pearson r between scores on the two halves of the test.
b. Divide the test into equivalent halves.
c. Adjust the half-test reliability using the Spearman-Brown formula

A

b-a-c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

refers to the degree of correlation among all the items on a scale

A

Inter-item consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

A measure of inter-item consistency is calculated from _____ of a single form of a test

A

a single administration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

measures a single trait

A

Homogenous Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

The more homogenous the test is, the better the…

A

internal consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Where test items are highly homogeneous, KR20 and split-half reliability estimates will be similar.

A

Kuder-Richardson formulas

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

is the statistic of choice for determining the inter-item consistency of ______, primarily those items that can be scored right or wrong (such as multiple-choice items).

A

dichotomous items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

If test items are more ______, KR20 will yield lower reliability estimates than the split-half method.

A

heterogeneous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Dichotomous items include 3 or more choices

A

False, it only includes 2 choices (i.e., yes or no, true or false)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

rKR20 stands for?

A

Kuder-Richardson formula 20 reliability coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

k is the…

A

number of test items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

σ2 is the…

A

variance of total test scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

p is the proportion of test takers who…

A

pass the item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

q is the proportion of people who…

A

fail the item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Σ pq is the sum of the pq products…

A

over all items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

the mean of all possible split half correlations, corrected by the Spearman-Brown formula

A

Coefficient Alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

are coefficient alpha items also dichotomous?

A

no, they are non dichotomous items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

𝑟α is coefficient…

A

alpha

51
Q

To increase reliability, increase the number of…

A

items or observation

52
Q

To increase reliability, eliminate items that are…

A

unclear

53
Q

To increase reliability, _____ the conditions under which the test is taken

A

standardize

54
Q

To increase reliability, ____ the degree of difficulty of the tests.

A

moderate

55
Q

To increase reliability, minimize the effects of…

A

external events

56
Q

To increase reliability, Standardize___

A

instructions

57
Q

To increase reliability, maintain consistent…

A

scoring procedures

58
Q

Test-retest us a measure of…

A

stability

59
Q

parallel or alternate forms is a measure for…

A

equivalence

60
Q

A type of reliability that is administered by measuring with the same test at two different times to the same group of participants?

A

Test- Retest

61
Q

A type of reliability administered with two forms of the test to the same group fo participants

A

parallel or alternate forms

62
Q

inter-rater is a measure of…

A

agreement

63
Q

internal consistency is the measure of…

A

how consistently each item measures the same underlying construct.

64
Q

A type of reliability where there are two or more raters that will rate behaviors then determine the amount of agreement between them.

A

Inter-rater

65
Q

A type of reliability done with correlate performance on each item with overall performance across participants

A

Internal Consistency

66
Q

Statistical coefficient of test-retest and parallel or alternate forms

A

Correlation (Pearson r or Spearman’s rho

67
Q

Statistical Computation for Inter-rater

A

Percentage
Kappa’s Coefficient

68
Q

Statistical Computation for Internal Consistency

A

Cronbach’s Alpha
Kuder-Richardson
Ordinal/Composite

69
Q

Alpha is an…

A

index

70
Q

Usually, an internal consistency value of ____ is deemed as appropriate.

A

.70

71
Q

However, a newly developed test should not, as much as possible, obtain a very high internal consistency of…

A

.90 and above

72
Q

0.95 internal consistency =

A

Redundant

73
Q

Nature of the Test

A

– Homogeneity versus heterogeneity of test items
– Dynamic versus static characteristics
– Speed Test versus Power

74
Q

compares the proportions of responses from two or more populations with regards to a dichotomous variable (e. g., male/female, yes/no) or variable with more than two outcome categories . Assumes that all items are equally effective in measuring the construct of interest.

A

Homogeneity

75
Q

the degree to which a test measures different factors, these tests measure more than one trait.

A

heterogeneity

76
Q

characteristics that are fixed, unchanging properties of a system or component that affects its reliability (constant)

A

Static

77
Q

time-independent properties that change during the operation or usage of a system or component (changes over time)

A

Dynamic

78
Q

measures how quickly a system, process, or individual can complete or task or respond to a stimulus (time-based, how fast you could answer or finish something)

A

Speed Test

79
Q

measures the maximum capacity, strength, or intensity of a system, process or individual (entails level of difficulty)

A

Power Test

80
Q

The agreement between a test score or measure and the quality it is believed to measure.

A

Validity

81
Q

judgment based on evidence about the appropriateness of _____ drawn from test scores.

A

inferences

82
Q

the process of gathering and evaluating evidence about validity

A

validation studies (i.e. local validation studies)

83
Q

Validity: Trinitarian Model

A

a. CONTENT VALIDITY
b. CRITERION-RELATED VALIDITY
c. CONSTRUCT VALIDITY

84
Q

Based from face value, it can measure what it purports to measure

A

Face Validity

85
Q

Extent to which a test assesses all the important aspects of a phenomenon that it purports to measure

A

Content Validity

86
Q

2 types of Criterion Validity

A

Concurrent Validity
Predictive Validity

87
Q

extent to which as test yields the same results as other, established measures of the same behavior, thoughts, or feelings

A

Concurrent Validity

88
Q

good at predicting how a person will think, act, or feel in the future

A

Predictive Validity

89
Q

extent to which a test measures what it is supposed to measure and not something else altogether

A

Construct Validity

90
Q

Is face validity a true measure of validity?

A

no

91
Q
A
92
Q

There is no evidence in face validity

A

true

93
Q

Says that something is true when it is actually false
Ex.: lalaki nag PT tapos positive

A

False-positive

94
Q

Says that something is false when it is actually true
Ex.: babae nag PT negative, pero nung nagpacheck sa OB-GYN positive

A

False-negative

95
Q

Two concepts of Content Validity

A
  • construct under-representation
    *construct-irrelevant variance
96
Q

Failure to capture important components of the construct

A

Construct under-representation

97
Q

Scores are influenced by factors irrelevant to the construct

A

Construct-irrelevant variance

98
Q

how a test corresponds to a particular criterion

A

Criterion of Validity

99
Q

predictor and criterion

A

predictive

100
Q

Relationship between a test and a criterion

A

Validity Coefficient

101
Q

.60 : rare; .30 to .40 are usually considered

A

high

102
Q

Statistical significance:

A

less than 5 in 100 chances

103
Q

In evaluating coefficients, look for changes in the cause of the

A

relationship

104
Q

criterion should be…

A

valid and reliable

105
Q

you need to consider if the sample size is

A

adequate

106
Q

Do not confuse the criterion with the…

A

predictor

107
Q

consider if there is variability in the…

A

criterion and the predictor

108
Q

consider if there is evidence for

A

validity generalization

109
Q

Consider differential…

A

prediction

110
Q

omething built by mental synthesis

A

Construct

111
Q

Involves assembling evidence about what a test means Show relationship between test and other measures

A

Construct Validity

112
Q

Correlation between two tests believed to measure the same construct

A

Convergent Evidence

113
Q

– Divergent validation
– The test measures something unique
– Low correlations with unrelated constructs

A

Discriminant Evidence

114
Q

ability to produce consistent scores that measure stable characteristics

A

Reliability

115
Q

which stable characteristics the test scores measure

A

Validity

116
Q

It is theoretically _____ to develop a reliable test that is not valid.

A

possible

117
Q

If a test is not reliable, its potential validity is…

A

limited

118
Q

The usefulness or practical value of testing to improve efficiency or of training program or intervention

A

Utility

119
Q

what are the 3 main factors that affects a test’s utility?

A
  • psychometric soundness
  • cost
  • benefits
120
Q

reliability and validity

A

Psychometric Soundness

121
Q

economic
financial
budget-related

A

cost

122
Q

_____ of testing justify the costs of administering, scoring, and interpreting the test.

A

benefits

123
Q

a family of techniques that entail a cost– benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment

A

Utility Analysis