Week 7: Reliability, Validity, & Utility Flashcards by Sabater, Doen Julieneir I.

Find the magnitude of error and develop ways to minimize them

Presence of Error

How well did you know this?

Not at all

Perfectly

Tests that are relatively free from measurement error are deemed to be…

reliable

How well did you know this?

Not at all

Perfectly

Less error =

High validity

How well did you know this?

Not at all

Perfectly

Error exists because we only obtain a sample of..

behavior

How well did you know this?

Not at all

Perfectly

who pioneered reliability assessment?

Charles Spearman

How well did you know this?

Not at all

Perfectly

other pioneers

– De Moivre
– Pearson
– Kuder and Richardson
– Cronbach

How well did you know this?

Not at all

Perfectly

CTT: 𝑋 =

𝑇 + E

How well did you know this?

Not at all

Perfectly

Measuring instruments are ____

imperfect

How well did you know this?

Not at all

Perfectly

observed score is almost always different from the ____ ability/characteristic

true

How well did you know this?

Not at all

Perfectly

____ of measurement are random

Errors

How well did you know this?

Not at all

Perfectly

Because of random error, repeated application produces…

different results

How well did you know this?

Not at all

Perfectly

Problem created by using a limited number of items to represent a larger, more complicated construct

Domain Sampling Model

How well did you know this?

Not at all

Perfectly

Task in reliability analysis is to estimate how much ______ is made by using a test score from the shorter test as estimate of the true ability

error

How well did you know this?

Not at all

Perfectly

the ratio of variance of the observed score on the shorter test and the variance of the long-run true score

Reliability

How well did you know this?

Not at all

Perfectly

Reliability can be estimated by ______ the observed score with the true score

correlating

How well did you know this?

Not at all

Perfectly

T is not available so we estimate what ___

they would be

How well did you know this?

Not at all

Perfectly

To estimate reliability, we create many randomly _____

paralleled test

How well did you know this?

Not at all

Perfectly

focuses on an item difficulty to assess the ability

Item Response Theory

How well did you know this?

Not at all

Perfectly

Parelleled Tests are the same tests measuring…

the same concepts

How well did you know this?

Not at all

Perfectly

Reliability is related to…

consistency

How well did you know this?

Not at all

Perfectly

Reliability Coefficient

is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance

How well did you know this?

Not at all

Perfectly

is 0.7 an accepted coefficient?

yes

How well did you know this?

Not at all

Perfectly

what coefficient cannot go beyond?

beyond 0.95

How well did you know this?

Not at all

Perfectly

what is the critical coefficient?

0.6

How well did you know this?

Not at all

Perfectly

What are sources of error?

* Test Construction * Test Administration * Test Scoring and I

What is under Test Construction?

Item sampling; content sampling

an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test

Test-Retest Reliability Estimate

When the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to as the coefficient of...

stability

exist when, for each form of the test, the means and the variances of observed test scores are equal

Parallel forms

simply different versions of a test that have been constructed so as to be parallel

Alternate Forms

coefficient of equivalence

Parallel-Forms and Alternate-Forms Reliability Estimates

obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once

Split-Half Reliability Estimates

What is the correct order for split-half? a. Calculate a Pearson r between scores on the two halves of the test. b. Divide the test into equivalent halves. c. Adjust the half-test reliability using the Spearman-Brown formula

b-a-c

refers to the degree of correlation among all the items on a scale

Inter-item consistency

A measure of inter-item consistency is calculated from _____ of a single form of a test

a single administration

measures a single trait

Homogenous Test

The more homogenous the test is, the better the...

internal consistency

Where test items are highly homogeneous, KR20 and split-half reliability estimates will be similar.

Kuder-Richardson formulas

is the statistic of choice for determining the inter-item consistency of ______, primarily those items that can be scored right or wrong (such as multiple-choice items).

dichotomous items

If test items are more ______, KR20 will yield lower reliability estimates than the split-half method.

heterogeneous

Dichotomous items include 3 or more choices

False, it only includes 2 choices (i.e., yes or no, true or false)

rKR20 stands for?

Kuder-Richardson formula 20 reliability coefficient

k is the...

number of test items

σ2 is the...

variance of total test scores

p is the proportion of test takers who...

pass the item

q is the proportion of people who...

fail the item

Σ pq is the sum of the pq products...

over all items

the mean of all possible split half correlations, corrected by the Spearman-Brown formula

Coefficient Alpha

are coefficient alpha items also dichotomous?

no, they are non dichotomous items

𝑟α is coefficient...

alpha

To increase reliability, increase the number of...

items or observation

To increase reliability, eliminate items that are...

unclear

To increase reliability, _____ the conditions under which the test is taken

standardize

To increase reliability, ____ the degree of difficulty of the tests.

moderate

To increase reliability, minimize the effects of...

external events

To increase reliability, Standardize___

instructions

To increase reliability, maintain consistent...

scoring procedures

Test-retest us a measure of...

stability

parallel or alternate forms is a measure for...

equivalence

A type of reliability that is administered by measuring with the same test at two different times to the same group of participants?

Test- Retest

A type of reliability administered with two forms of the test to the same group fo participants

parallel or alternate forms

inter-rater is a measure of...

agreement

internal consistency is the measure of...

how consistently each item measures the same underlying construct.

A type of reliability where there are two or more raters that will rate behaviors then determine the amount of agreement between them.

Inter-rater

A type of reliability done with correlate performance on each item with overall performance across participants

Internal Consistency

Statistical coefficient of test-retest and parallel or alternate forms

Correlation (Pearson r or Spearman's rho

Statistical Computation for Inter-rater

Percentage Kappa's Coefficient

Statistical Computation for Internal Consistency

Cronbach's Alpha Kuder-Richardson Ordinal/Composite

Alpha is an...

index

Usually, an internal consistency value of ____ is deemed as appropriate.

.70

However, a newly developed test should not, as much as possible, obtain a very high internal consistency of...

.90 and above

0.95 internal consistency =

Redundant

Nature of the Test

– Homogeneity versus heterogeneity of test items – Dynamic versus static characteristics – Speed Test versus Power

compares the proportions of responses from two or more populations with regards to a dichotomous variable (e. g., male/female, yes/no) or variable with more than two outcome categories . Assumes that all items are equally effective in measuring the construct of interest.

Homogeneity

the degree to which a test measures different factors, these tests measure more than one trait.

heterogeneity

characteristics that are fixed, unchanging properties of a system or component that affects its reliability (constant)

Static

time-independent properties that change during the operation or usage of a system or component (changes over time)

Dynamic

measures how quickly a system, process, or individual can complete or task or respond to a stimulus (time-based, how fast you could answer or finish something)

Speed Test

measures the maximum capacity, strength, or intensity of a system, process or individual (entails level of difficulty)

Power Test

The agreement between a test score or measure and the quality it is believed to measure.

Validity

judgment based on evidence about the appropriateness of _____ drawn from test scores.

inferences

the process of gathering and evaluating evidence about validity

validation studies (i.e. local validation studies)

Validity: Trinitarian Model

a. CONTENT VALIDITY b. CRITERION-RELATED VALIDITY c. CONSTRUCT VALIDITY

Based from face value, it can measure what it purports to measure

Face Validity

Extent to which a test assesses all the important aspects of a phenomenon that it purports to measure

Content Validity

2 types of Criterion Validity

Concurrent Validity Predictive Validity

extent to which as test yields the same results as other, established measures of the same behavior, thoughts, or feelings

Concurrent Validity

good at predicting how a person will think, act, or feel in the future

Predictive Validity

extent to which a test measures what it is supposed to measure and not something else altogether

Construct Validity

Is face validity a true measure of validity?

There is no evidence in face validity

true

Says that something is true when it is actually false Ex.: lalaki nag PT tapos positive

False-positive

Says that something is false when it is actually true Ex.: babae nag PT negative, pero nung nagpacheck sa OB-GYN positive

False-negative

Two concepts of Content Validity

* construct under-representation *construct-irrelevant variance

Failure to capture important components of the construct

Construct under-representation

Scores are influenced by factors irrelevant to the construct

Construct-irrelevant variance

how a test corresponds to a particular criterion

Criterion of Validity

predictor and criterion

predictive

Relationship between a test and a criterion

Validity Coefficient

.60 : rare; .30 to .40 are usually considered

high

Statistical significance:

less than 5 in 100 chances

In evaluating coefficients, look for changes in the cause of the

relationship

criterion should be...

valid and reliable

you need to consider if the sample size is

adequate

Do not confuse the criterion with the...

predictor

consider if there is variability in the...

criterion and the predictor

consider if there is evidence for

validity generalization

Consider differential...

prediction

omething built by mental synthesis

Construct

Involves assembling evidence about what a test means Show relationship between test and other measures

Construct Validity

Correlation between two tests believed to measure the same construct

Convergent Evidence

– Divergent validation – The test measures something unique – Low correlations with unrelated constructs

Discriminant Evidence

ability to produce consistent scores that measure stable characteristics

Reliability

which stable characteristics the test scores measure

Validity

It is theoretically _____ to develop a reliable test that is not valid.

possible

If a test is not reliable, its potential validity is...

limited

The usefulness or practical value of testing to improve efficiency or of training program or intervention

Utility

what are the 3 main factors that affects a test's utility?

* psychometric soundness * cost * benefits

reliability and validity

Psychometric Soundness

economic financial budget-related

cost

_____ of testing justify the costs of administering, scoring, and interpreting the test.

benefits

a family of techniques that entail a cost– benefit analysis designed to yield information relevant to a decision about the usefulness and/or practical value of a tool of assessment

Utility Analysis

Week 7: Reliability, Validity, & Utility Flashcards

(123 cards)