RELIABILITY Flashcards by Adel Camante

measure of the accuracy of a test or measuring instrument obtained by measuring the same individuals twice and computing the correlation of the two sets of measures

It is used to determine how much of the variability in observed scores is due to true differences in the construct being measured, as opposed to measurement error.

Ex: Imagine a test designed to measure job satisfaction. If the reliability coefficient is 0.85, it suggests that 85% of the variance in test scores is due to true differences in job satisfaction, while 15% is due to random or measurement error.

reliability coefficient

How well did you know this?

Not at all

Perfectly

this assumes that each person has a true score that would be obtained if there were no errors in measurement

classical test theory

How well did you know this?

Not at all

Perfectly

refers to, collectively, all of the factors associate with the process of measuring some variable, other than the variable being measured

If a thermometer reads 1°C too high (systematic error) and the readings fluctuate slightly (random error), these combined are referred to as___

“Both type of error”

measurement error

How well did you know this?

Not at all

Perfectly

source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process

Measuring heart rate with slight variations each time due to changes in the subject’s movement or the observer’s timing.

“Unpredictable fluctuations”

random error

How well did you know this?

Not at all

Perfectly

a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value

A weighing scale that always adds 2 kg to the actual weight introduces a systematic error.

“bias in one direction”

systematic error

How well did you know this?

Not at all

Perfectly

item sampling or content
sampling - terms that refer to
variation among items within a
test as well as to variation
among items between tests

test construction

How well did you know this?

Not at all

Perfectly

sources of error variance that occur during test administration may
influence the test taker’s attention or motivation

test administration

How well did you know this?

Not at all

Perfectly

scorers and scoring systems are potential sources of error variance

test scoring and interpretation

How well did you know this?

Not at all

Perfectly

an estimate of reliability obtained by correlating pairs of scores from the SAME PEOPLE on TWO DIFFERENT ADMINISTRATIONS of the same test

Di pwede sa dynamic test

-To determine whether a test produces consistent results across time.

test retest reliability

How well did you know this?

Not at all

Perfectly

estimate of test- retest reliability when the interval between testing is
greater than six months

• If a personality inventory is administered today and then again after six months, the correlation between the scores would represent the___

coefficient of stability

How well did you know this?

Not at all

Perfectly

occurs when the first testing session influences the results
of the second session, and this can affect the test-retest
reliability of a psychological measure

Kapag di 15 days

carry over effect

How well did you know this?

Not at all

Perfectly

-a type of carryover effect wherein the scores on the second test administration are higher than they were on the first

• A student taking the same cognitive test multiple times might score higher because they’ve become familiar with the questions, not because their cognitive abilities have improved.

Kapag masyadong malapit yung administration (Ex: 5 days)

practice effect

How well did you know this?

Not at all

Perfectly

uses one set of questions divided into two equivalent
sets(“forms”), where both sets contain questions that measure the same construct, knowledge or skill

This means they have the same difficulty level, the same number of items, and measure the construct in an identical way.

parallel forms reliability

How well did you know this?

Not at all

Perfectly

an estimate of the extent to which these different forms of the same test have been affected by item
sampling error, or other error.
means different versions of a test that have been constructed so as to be parallel
The forms are equivalent in content but may not be statistically identical.

alternate forms reliability

How well did you know this?

Not at all

Perfectly

one of the most rigorous and burdensome assessments of reliability since test developers have to create two forms of the same test practical constraints make it difficult to retest the
same group of individuals

limitations of parallel and alternate forms reliability

How well did you know this?

Not at all

Perfectly

obtained by correlating two pairs of scores obtained
from equivalent halves of a single test administered
once Three (3)steps
Step 1. Divide the test into equivalent halves.
Step 2. Calculate a Pearson r between scores on the
two halves ofthe test.
Step 3. Adjust the half-test reliability using the
Spearman–Brown formula

split half reliability

How well did you know this?

Not at all

Perfectly

a statistics which allows a test developer to estimate
what correlations between the two halves would
have been if each half had been the length of the
whole test and have equal variances

spearman brown formula

How well did you know this?

Not at all

Perfectly

if they contain items that measure a single trait

homogeneity

the degree to which a test measures different
factors

heterogeneity

the statistics used for calculating the reliability of a test in which the items are dichotomous or scored as 0 or 1 above 0.50 is reasonable reliability coefficient and the test is considered homogeneous if the reliability coefficient is above 0.90

kuder-Richardson formula 20 (KR20)

the preferred statistic for obtaining an estimate of internal consistency reliability
may be thought of as the mean of all possible split-half
correlations, corrected by the Spearman–Brown Formula
appropriate for use on tests containing non-dichotomous
items

coefficient alpha (Cronbach’s Alpha)

the degree of agreement or
consistency between two or more scorers (or judges or raters) with regard to a particular measure

interrater reliability

the best method for assessing the level of agreement between raters is __

Kapag 2 or more ang Raters

kappa statistics

___ have reasonably high internal consistency than heterogenous test (measuring multiple factors or traits)
- CRONBACH’S ALPHA ang ginagamit if 1 lang ang Factor
- If 2 or more FACTORS ANALYSIS

Ex: all participants share the same trait, like age group or occupation, ensuring the group is uniform for analysis.

homogeneity

The best reliability for a test measuring a dynamic characteristic (e.g. anxiety, happiness) is ___

internal consistency

For ___characteristic, test-retest or alternate/parallel forms method would be appropriate

static

a modification of the domain sampling theory which indicates that a person’s test scores may vary from testing to testing because of variables in the testing situation Ex: Imagine a teacher is evaluating student performance using essays. Several factors might affect the scores: Raters: Different teachers might grade differently. Tasks: The essay prompts might vary in difficulty. Occasions: Scores might change based on when the essays are written.

generalizability theory

a way to analyze responses to tests or questionnaires with the goal of improving measurement accuracy and reliability Ex: Imagine a math test with items ranging in difficulty. Using IRT, researchers can analyze which items are most effective at distinguishing between students of varying skill levels.

item response theory

- also known as Standard Error of Scores - provides a measure of the precision of an observed test score - provides an estimate of the amount of error inherent in an observed score of measurement

standard error of measurement

The higher the __ the lower the ___and vice versa

SEM reliability

For Basic Research, a reliability coefficient of ____ and ____ is acceptable For Clinical Research, ____or better is acceptable

0.70 0.80 0.90

Specifically measures internal consistency for dichotomous items (e.g., "True/False" or "Correct/Incorrect" and MULTIPLE CHOICE)

KR20

Estimates reliability after splitting a test into halves (commonly used in split-half reliability) or when adjusting for the effect of extending/shortening the test length. Example Usage: Splitting a vocabulary test into two halves to check reliability, or calculating reliability if the test length were doubled.

Spearman-brown formula

1. May time limit (General knowledge lang ang mini-measure) - Pwede sa test-retest (bawal lang sa dynamic test) 2. Di strict sa time, increasing ang Level of difficulty

Speed test Power test

Di ina-analyze ang score, ang inaalam is paano mag RESPOND sa item ang test taker • Ginagamit sa CAT

Item response theory

Gini-generalize ang score to other settings (Ex: score of BSP 2-4 to score of 2-6)

Generalized theory

Ideal duration for test retest reliability (Application: Personality test, aptitude test, speed test, intelligence test, achievement test)

15 days or more

1 test pero hahatiin (ex: 100 items divide into 2) - Set A and Set B - same ang difficulty ng set A sa B - PERSON r ang ginagamit - Same participants

Parallel form

Ang ginagamit ay PEARSON'S r • Ang magiging result is ____

Test retest reliability Coefficient of stability

Correlate between items - ginagamit sa test development - Ex: Anxiety scale (Question 1 correlate to 2, until 50) - CRONBACH'S ALPHA ang statistical tool - MCDONALD'S OMEGA (Usually ang ginagamit sa school instead Cronbach's) - Ginagamit if all item is intended to the construct being measured - Pwede konti basta hindi bababa ng 3 questions

Internal consistencies

Create a test and divide into 2 half (Divide equally) - MAGKAIBA ang participants - ODD-EVEN-SCHEME ang way ng pagbilang - SPEARMAN RHO ang ginagamit

Split half reliability

Ginagamit kapag ang test is may same difficulty level (Bihira magamit) - Kapag mababa ang ITEM DIFFICULTY INDEX = Mahirap ang test - kapag mataas = Madali ang test

Kr21

Kapag 2 lang ang raters

Cohen's Kappa