Week 7: Reliability, Validity, & Utility Flashcards
Find the magnitude of error and develop ways to minimize them
Presence of Error
Tests that are relatively free from measurement error are deemed to be…
reliable
Less error =
High validity
Error exists because we only obtain a sample of..
behavior
who pioneered reliability assessment?
Charles Spearman
other pioneers
– De Moivre
– Pearson
– Kuder and Richardson
– Cronbach
CTT: 𝑋 =
𝑇 + E
Measuring instruments are ____
imperfect
observed score is almost always different from the ____ ability/characteristic
true
____ of measurement are random
Errors
Because of random error, repeated application produces…
different results
Problem created by using a limited number of items to represent a larger, more complicated construct
Domain Sampling Model
Task in reliability analysis is to estimate how much ______ is made by using a test score from the shorter test as estimate of the true ability
error
the ratio of variance of the observed score on the shorter test and the variance of the long-run true score
Reliability
Reliability can be estimated by ______ the observed score with the true score
correlating
T is not available so we estimate what ___
they would be
To estimate reliability, we create many randomly _____
paralleled test
focuses on an item difficulty to assess the ability
Item Response Theory
Parelleled Tests are the same tests measuring…
the same concepts
Reliability is related to…
consistency
Reliability Coefficient
is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance
is 0.7 an accepted coefficient?
yes
what coefficient cannot go beyond?
beyond 0.95
what is the critical coefficient?
0.6
What are sources of error?
- Test Construction
- Test Administration
- Test Scoring and I
What is under Test Construction?
Item sampling; content sampling
an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test
Test-Retest Reliability Estimate
When the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to as the coefficient of…
stability
exist when, for each form of the test, the means and the variances of observed test scores are equal
Parallel forms
simply different versions of a test that have been constructed so as to be parallel
Alternate Forms
coefficient of equivalence
Parallel-Forms and Alternate-Forms Reliability Estimates
obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once
Split-Half Reliability Estimates
What is the correct order for split-half?
a. Calculate a Pearson r between scores on the two halves of the test.
b. Divide the test into equivalent halves.
c. Adjust the half-test reliability using the Spearman-Brown formula
b-a-c
refers to the degree of correlation among all the items on a scale
Inter-item consistency
A measure of inter-item consistency is calculated from _____ of a single form of a test
a single administration
measures a single trait
Homogenous Test
The more homogenous the test is, the better the…
internal consistency
Where test items are highly homogeneous, KR20 and split-half reliability estimates will be similar.
Kuder-Richardson formulas
is the statistic of choice for determining the inter-item consistency of ______, primarily those items that can be scored right or wrong (such as multiple-choice items).
dichotomous items
If test items are more ______, KR20 will yield lower reliability estimates than the split-half method.
heterogeneous
Dichotomous items include 3 or more choices
False, it only includes 2 choices (i.e., yes or no, true or false)
rKR20 stands for?
Kuder-Richardson formula 20 reliability coefficient
k is the…
number of test items
σ2 is the…
variance of total test scores
p is the proportion of test takers who…
pass the item
q is the proportion of people who…
fail the item
Σ pq is the sum of the pq products…
over all items
the mean of all possible split half correlations, corrected by the Spearman-Brown formula
Coefficient Alpha
are coefficient alpha items also dichotomous?
no, they are non dichotomous items
𝑟α is coefficient…
alpha
To increase reliability, increase the number of…
items or observation
To increase reliability, eliminate items that are…
unclear
To increase reliability, _____ the conditions under which the test is taken
standardize
To increase reliability, ____ the degree of difficulty of the tests.
moderate
To increase reliability, minimize the effects of…
external events
To increase reliability, Standardize___
instructions
To increase reliability, maintain consistent…
scoring procedures
Test-retest us a measure of…
stability
parallel or alternate forms is a measure for…
equivalence
A type of reliability that is administered by measuring with the same test at two different times to the same group of participants?
Test- Retest
A type of reliability administered with two forms of the test to the same group fo participants
parallel or alternate forms
inter-rater is a measure of…
agreement
internal consistency is the measure of…
how consistently each item measures the same underlying construct.
A type of reliability where there are two or more raters that will rate behaviors then determine the amount of agreement between them.
Inter-rater
A type of reliability done with correlate performance on each item with overall performance across participants
Internal Consistency
Statistical coefficient of test-retest and parallel or alternate forms
Correlation (Pearson r or Spearman’s rho
Statistical Computation for Inter-rater
Percentage
Kappa’s Coefficient
Statistical Computation for Internal Consistency
Cronbach’s Alpha
Kuder-Richardson
Ordinal/Composite
Alpha is an…
index
Usually, an internal consistency value of ____ is deemed as appropriate.
.70
However, a newly developed test should not, as much as possible, obtain a very high internal consistency of…
.90 and above
0.95 internal consistency =
Redundant
Nature of the Test
– Homogeneity versus heterogeneity of test items
– Dynamic versus static characteristics
– Speed Test versus Power
compares the proportions of responses from two or more populations with regards to a dichotomous variable (e. g., male/female, yes/no) or variable with more than two outcome categories . Assumes that all items are equally effective in measuring the construct of interest.
Homogeneity
the degree to which a test measures different factors, these tests measure more than one trait.
heterogeneity
characteristics that are fixed, unchanging properties of a system or component that affects its reliability (constant)
Static
time-independent properties that change during the operation or usage of a system or component (changes over time)
Dynamic
measures how quickly a system, process, or individual can complete or task or respond to a stimulus (time-based, how fast you could answer or finish something)
Speed Test
measures the maximum capacity, strength, or intensity of a system, process or individual (entails level of difficulty)
Power Test
The agreement between a test score or measure and the quality it is believed to measure.
Validity
judgment based on evidence about the appropriateness of _____ drawn from test scores.
inferences
the process of gathering and evaluating evidence about validity
validation studies (i.e. local validation studies)
Validity: Trinitarian Model
a. CONTENT VALIDITY
b. CRITERION-RELATED VALIDITY
c. CONSTRUCT VALIDITY
Based from face value, it can measure what it purports to measure
Face Validity
Extent to which a test assesses all the important aspects of a phenomenon that it purports to measure
Content Validity
2 types of Criterion Validity
Concurrent Validity
Predictive Validity
extent to which as test yields the same results as other, established measures of the same behavior, thoughts, or feelings
Concurrent Validity
good at predicting how a person will think, act, or feel in the future
Predictive Validity
extent to which a test measures what it is supposed to measure and not something else altogether
Construct Validity
Is face validity a true measure of validity?
no
There is no evidence in face validity
true
Says that something is true when it is actually false
Ex.: lalaki nag PT tapos positive
False-positive
Says that something is false when it is actually true
Ex.: babae nag PT negative, pero nung nagpacheck sa OB-GYN positive
False-negative
Two concepts of Content Validity
- construct under-representation
*construct-irrelevant variance
Failure to capture important components of the construct
Construct under-representation
Scores are influenced by factors irrelevant to the construct
Construct-irrelevant variance
how a test corresponds to a particular criterion
Criterion of Validity
predictor and criterion
predictive
Relationship between a test and a criterion
Validity Coefficient
.60 : rare; .30 to .40 are usually considered
high
Statistical significance:
less than 5 in 100 chances
In evaluating coefficients, look for changes in the cause of the
relationship
criterion should be…
valid and reliable
you need to consider if the sample size is
adequate
Do not confuse the criterion with the…
predictor
consider if there is variability in the…
criterion and the predictor
consider if there is evidence for
validity generalization
Consider differential…
prediction
omething built by mental synthesis
Construct
Involves assembling evidence about what a test means Show relationship between test and other measures
Construct Validity
Correlation between two tests believed to measure the same construct
Convergent Evidence
– Divergent validation
– The test measures something unique
– Low correlations with unrelated constructs
Discriminant Evidence
ability to produce consistent scores that measure stable characteristics
Reliability
which stable characteristics the test scores measure
Validity
It is theoretically _____ to develop a reliable test that is not valid.
possible
If a test is not reliable, its potential validity is…
limited
The usefulness or practical value of testing to improve efficiency or of training program or intervention
Utility
what are the 3 main factors that affects a test’s utility?
- psychometric soundness
- cost
- benefits
reliability and validity
Psychometric Soundness
economic
financial
budget-related
cost
_____ of testing justify the costs of administering, scoring, and interpreting the test.
benefits
a family of techniques that entail a cost– benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment
Utility Analysis