Week 7: Reliability, Validity, & Utility Flashcards

1
Q

Find the magnitude of error and develop ways to minimize them

A

Presence of Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Tests that are relatively free from measurement error are deemed to be…

A

reliable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Less error =

A

High validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Error exists because we only obtain a sample of..

A

behavior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

who pioneered reliability assessment?

A

Charles Spearman

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

other pioneers

A

– De Moivre
– Pearson
– Kuder and Richardson
– Cronbach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

CTT: 𝑋 =

A

𝑇 + E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Measuring instruments are ____

A

imperfect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

observed score is almost always different from the ____ ability/characteristic

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

____ of measurement are random

A

Errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Because of random error, repeated application produces…

A

different results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Problem created by using a limited number of items to represent a larger, more complicated construct

A

Domain Sampling Model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Task in reliability analysis is to estimate how much ______ is made by using a test score from the shorter test as estimate of the true ability

A

error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

the ratio of variance of the observed score on the shorter test and the variance of the long-run true score

A

Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Reliability can be estimated by ______ the observed score with the true score

A

correlating

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

T is not available so we estimate what ___

A

they would be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

To estimate reliability, we create many randomly _____

A

paralleled test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

focuses on an item difficulty to assess the ability

A

Item Response Theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Parelleled Tests are the same tests measuring…

A

the same concepts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Reliability is related to…

A

consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Reliability Coefficient

A

is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

is 0.7 an accepted coefficient?

A

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what coefficient cannot go beyond?

A

beyond 0.95

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is the critical coefficient?

A

0.6

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What are sources of error?
* Test Construction * Test Administration * Test Scoring and I
26
What is under Test Construction?
Item sampling; content sampling
27
an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test
Test-Retest Reliability Estimate
28
When the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to as the coefficient of...
stability
29
exist when, for each form of the test, the means and the variances of observed test scores are equal
Parallel forms
30
simply different versions of a test that have been constructed so as to be parallel
Alternate Forms
31
coefficient of equivalence
Parallel-Forms and Alternate-Forms Reliability Estimates
32
obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once
Split-Half Reliability Estimates
33
What is the correct order for split-half? a. Calculate a Pearson r between scores on the two halves of the test. b. Divide the test into equivalent halves. c. Adjust the half-test reliability using the Spearman-Brown formula
b-a-c
34
refers to the degree of correlation among all the items on a scale
Inter-item consistency
35
A measure of inter-item consistency is calculated from _____ of a single form of a test
a single administration
36
measures a single trait
Homogenous Test
37
The more homogenous the test is, the better the...
internal consistency
38
Where test items are highly homogeneous, KR20 and split-half reliability estimates will be similar.
Kuder-Richardson formulas
39
is the statistic of choice for determining the inter-item consistency of ______, primarily those items that can be scored right or wrong (such as multiple-choice items).
dichotomous items
40
If test items are more ______, KR20 will yield lower reliability estimates than the split-half method.
heterogeneous
41
Dichotomous items include 3 or more choices
False, it only includes 2 choices (i.e., yes or no, true or false)
42
rKR20 stands for?
Kuder-Richardson formula 20 reliability coefficient
43
k is the...
number of test items
44
σ2 is the...
variance of total test scores
45
p is the proportion of test takers who...
pass the item
46
q is the proportion of people who...
fail the item
47
Σ pq is the sum of the pq products...
over all items
48
the mean of all possible split half correlations, corrected by the Spearman-Brown formula
Coefficient Alpha
49
are coefficient alpha items also dichotomous?
no, they are non dichotomous items
50
𝑟α is coefficient...
alpha
51
To increase reliability, increase the number of...
items or observation
52
To increase reliability, eliminate items that are...
unclear
53
To increase reliability, _____ the conditions under which the test is taken
standardize
54
To increase reliability, ____ the degree of difficulty of the tests.
moderate
55
To increase reliability, minimize the effects of...
external events
56
To increase reliability, Standardize___
instructions
57
To increase reliability, maintain consistent...
scoring procedures
58
Test-retest us a measure of...
stability
59
parallel or alternate forms is a measure for...
equivalence
60
A type of reliability that is administered by measuring with the same test at two different times to the same group of participants?
Test- Retest
61
A type of reliability administered with two forms of the test to the same group fo participants
parallel or alternate forms
62
inter-rater is a measure of...
agreement
63
internal consistency is the measure of...
how consistently each item measures the same underlying construct.
64
A type of reliability where there are two or more raters that will rate behaviors then determine the amount of agreement between them.
Inter-rater
65
A type of reliability done with correlate performance on each item with overall performance across participants
Internal Consistency
66
Statistical coefficient of test-retest and parallel or alternate forms
Correlation (Pearson r or Spearman's rho
67
Statistical Computation for Inter-rater
Percentage Kappa's Coefficient
68
Statistical Computation for Internal Consistency
Cronbach's Alpha Kuder-Richardson Ordinal/Composite
69
Alpha is an...
index
70
Usually, an internal consistency value of ____ is deemed as appropriate.
.70
71
However, a newly developed test should not, as much as possible, obtain a very high internal consistency of...
.90 and above
72
0.95 internal consistency =
Redundant
73
Nature of the Test
– Homogeneity versus heterogeneity of test items – Dynamic versus static characteristics – Speed Test versus Power
74
compares the proportions of responses from two or more populations with regards to a dichotomous variable (e. g., male/female, yes/no) or variable with more than two outcome categories . Assumes that all items are equally effective in measuring the construct of interest.
Homogeneity
75
the degree to which a test measures different factors, these tests measure more than one trait.
heterogeneity
76
characteristics that are fixed, unchanging properties of a system or component that affects its reliability (constant)
Static
77
time-independent properties that change during the operation or usage of a system or component (changes over time)
Dynamic
78
measures how quickly a system, process, or individual can complete or task or respond to a stimulus (time-based, how fast you could answer or finish something)
Speed Test
79
measures the maximum capacity, strength, or intensity of a system, process or individual (entails level of difficulty)
Power Test
80
The agreement between a test score or measure and the quality it is believed to measure.
Validity
81
judgment based on evidence about the appropriateness of _____ drawn from test scores.
inferences
82
the process of gathering and evaluating evidence about validity
validation studies (i.e. local validation studies)
83
Validity: Trinitarian Model
a. CONTENT VALIDITY b. CRITERION-RELATED VALIDITY c. CONSTRUCT VALIDITY
84
Based from face value, it can measure what it purports to measure
Face Validity
85
Extent to which a test assesses all the important aspects of a phenomenon that it purports to measure
Content Validity
86
2 types of Criterion Validity
Concurrent Validity Predictive Validity
87
extent to which as test yields the same results as other, established measures of the same behavior, thoughts, or feelings
Concurrent Validity
88
good at predicting how a person will think, act, or feel in the future
Predictive Validity
89
extent to which a test measures what it is supposed to measure and not something else altogether
Construct Validity
90
Is face validity a true measure of validity?
no
91
92
There is no evidence in face validity
true
93
Says that something is true when it is actually false Ex.: lalaki nag PT tapos positive
False-positive
94
Says that something is false when it is actually true Ex.: babae nag PT negative, pero nung nagpacheck sa OB-GYN positive
False-negative
95
Two concepts of Content Validity
* construct under-representation *construct-irrelevant variance
96
Failure to capture important components of the construct
Construct under-representation
97
Scores are influenced by factors irrelevant to the construct
Construct-irrelevant variance
98
how a test corresponds to a particular criterion
Criterion of Validity
99
predictor and criterion
predictive
100
Relationship between a test and a criterion
Validity Coefficient
101
.60 : rare; .30 to .40 are usually considered
high
102
Statistical significance:
less than 5 in 100 chances
103
In evaluating coefficients, look for changes in the cause of the
relationship
104
criterion should be...
valid and reliable
105
you need to consider if the sample size is
adequate
106
Do not confuse the criterion with the...
predictor
107
consider if there is variability in the...
criterion and the predictor
108
consider if there is evidence for
validity generalization
109
Consider differential...
prediction
110
omething built by mental synthesis
Construct
111
Involves assembling evidence about what a test means Show relationship between test and other measures
Construct Validity
112
Correlation between two tests believed to measure the same construct
Convergent Evidence
113
– Divergent validation – The test measures something unique – Low correlations with unrelated constructs
Discriminant Evidence
114
ability to produce consistent scores that measure stable characteristics
Reliability
115
which stable characteristics the test scores measure
Validity
116
It is theoretically _____ to develop a reliable test that is not valid.
possible
117
If a test is not reliable, its potential validity is...
limited
118
The usefulness or practical value of testing to improve efficiency or of training program or intervention
Utility
119
what are the 3 main factors that affects a test's utility?
* psychometric soundness * cost * benefits
120
reliability and validity
Psychometric Soundness
121
economic financial budget-related
cost
122
_____ of testing justify the costs of administering, scoring, and interpreting the test.
benefits
123
a family of techniques that entail a cost– benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment
Utility Analysis