RELIABILITY Flashcards

1
Q

measure of the accuracy of a test or measuring instrument obtained by measuring the same individuals twice and computing the correlation of the two sets of measures

  • It is used to determine how much of the variability in observed scores is due to true differences in the construct being measured, as opposed to measurement error.

Ex: Imagine a test designed to measure job satisfaction. If the reliability coefficient is 0.85, it suggests that 85% of the variance in test scores is due to true differences in job satisfaction, while 15% is due to random or measurement error.

A

reliability coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

this assumes that each person has a true score that would be obtained if there were no errors in measurement

A

classical test theory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

refers to, collectively, all of the factors associate with the process of measuring some variable, other than the variable being measured

  • If a thermometer reads 1°C too high (systematic error) and the readings fluctuate slightly (random error), these combined are referred to as___

“Both type of error”

A

measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process

  • Measuring heart rate with slight variations each time due to changes in the subject’s movement or the observer’s timing.

“Unpredictable fluctuations”

A

random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value

  • A weighing scale that always adds 2 kg to the actual weight introduces a systematic error.

“bias in one direction”

A

systematic error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

item sampling or content
sampling - terms that refer to
variation among items within a
test as well as to variation
among items between tests

A

test construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

sources of error variance that occur during test administration may
influence the test taker’s attention or motivation

A

test administration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

scorers and scoring systems are potential sources of error variance

A

test scoring and interpretation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

an estimate of reliability obtained by correlating pairs of scores from the SAME PEOPLE on TWO DIFFERENT ADMINISTRATIONS of the same test

  • Di pwede sa dynamic test

-To determine whether a test produces consistent results across time.

A

test retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

estimate of test- retest reliability when the interval between testing is
greater than six months

• If a personality inventory is administered today and then again after six months, the correlation between the scores would represent the___

A

coefficient of stability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

occurs when the first testing session influences the results
of the second session, and this can affect the test-retest
reliability of a psychological measure

  • Kapag di 15 days
A

carry over effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

-a type of carryover effect wherein the scores on the second test administration are higher than they were on the first

• A student taking the same cognitive test multiple times might score higher because they’ve become familiar with the questions, not because their cognitive abilities have improved.

  • Kapag masyadong malapit yung administration (Ex: 5 days)
A

practice effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

uses one set of questions divided into two equivalent
sets(“forms”), where both sets contain questions that measure the same construct, knowledge or skill

  • This means they have the same difficulty level, the same number of items, and measure the construct in an identical way.
A

parallel forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  • an estimate of the extent to which these different forms of the same test have been affected by item
    sampling error, or other error.
  • means different versions of a test that have been constructed so as to be parallel
  • The forms are equivalent in content but may not be statistically identical.
A

alternate forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

one of the most rigorous and burdensome assessments of reliability since test developers have to create two forms of the same test practical constraints make it difficult to retest the
same group of individuals

A

limitations of parallel and alternate forms reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

obtained by correlating two pairs of scores obtained
from equivalent halves of a single test administered
once Three (3)steps
Step 1. Divide the test into equivalent halves.
Step 2. Calculate a Pearson r between scores on the
two halves ofthe test.
Step 3. Adjust the half-test reliability using the
Spearman–Brown formula

A

split half reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

a statistics which allows a test developer to estimate
what correlations between the two halves would
have been if each half had been the length of the
whole test and have equal variances

A

spearman brown formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

if they contain items that measure a single trait

A

homogeneity

19
Q

the degree to which a test measures different
factors

A

heterogeneity

20
Q

the statistics used for calculating the reliability of a test in which the items are dichotomous or scored as 0 or 1 above 0.50 is reasonable reliability coefficient and the test is considered homogeneous if the reliability coefficient is above 0.90

A

kuder-Richardson formula 20 (KR20)

21
Q
  • the preferred statistic for obtaining an estimate of internal consistency reliability
  • may be thought of as the mean of all possible split-half
    correlations, corrected by the Spearman–Brown Formula
  • appropriate for use on tests containing non-dichotomous
    items
A

coefficient alpha (Cronbach’s Alpha)

22
Q

the degree of agreement or
consistency between two or more scorers (or judges or raters) with regard to a particular measure

A

interrater reliability

23
Q

the best method for assessing the level of agreement between raters is __

  • Kapag 2 or more ang Raters
A

kappa statistics

24
Q

___ have reasonably high internal consistency than heterogenous test (measuring multiple factors or traits)
- CRONBACH’S ALPHA ang ginagamit if 1 lang ang Factor
- If 2 or more FACTORS ANALYSIS

Ex: all participants share the same trait, like age group or occupation, ensuring the group is uniform for analysis.

A

homogeneity

25
The best reliability for a test measuring a dynamic characteristic (e.g. anxiety, happiness) is ___
internal consistency
26
For ___characteristic, test-retest or alternate/parallel forms method would be appropriate
static
27
a modification of the domain sampling theory which indicates that a person’s test scores may vary from testing to testing because of variables in the testing situation Ex: Imagine a teacher is evaluating student performance using essays. Several factors might affect the scores: Raters: Different teachers might grade differently. Tasks: The essay prompts might vary in difficulty. Occasions: Scores might change based on when the essays are written.
generalizability theory
28
a way to analyze responses to tests or questionnaires with the goal of improving measurement accuracy and reliability Ex: Imagine a math test with items ranging in difficulty. Using IRT, researchers can analyze which items are most effective at distinguishing between students of varying skill levels.
item response theory
29
- also known as Standard Error of Scores - provides a measure of the precision of an observed test score - provides an estimate of the amount of error inherent in an observed score of measurement
standard error of measurement
30
The higher the __ the lower the ___and vice versa
SEM reliability
31
For Basic Research, a reliability coefficient of ____ and ____ is acceptable For Clinical Research, ____or better is acceptable
0.70 0.80 0.90
32
Specifically measures internal consistency for dichotomous items (e.g., "True/False" or "Correct/Incorrect" and MULTIPLE CHOICE)
KR20
33
Estimates reliability after splitting a test into halves (commonly used in split-half reliability) or when adjusting for the effect of extending/shortening the test length. Example Usage: Splitting a vocabulary test into two halves to check reliability, or calculating reliability if the test length were doubled.
Spearman-brown formula
34
1. May time limit (General knowledge lang ang mini-measure) - Pwede sa test-retest (bawal lang sa dynamic test) 2. Di strict sa time, increasing ang Level of difficulty
Speed test Power test
35
Di ina-analyze ang score, ang inaalam is paano mag RESPOND sa item ang test taker • Ginagamit sa CAT
Item response theory
36
Gini-generalize ang score to other settings (Ex: score of BSP 2-4 to score of 2-6)
Generalized theory
37
38
Ideal duration for test retest reliability (Application: Personality test, aptitude test, speed test, intelligence test, achievement test)
15 days or more
39
1 test pero hahatiin (ex: 100 items divide into 2) - Set A and Set B - same ang difficulty ng set A sa B - PERSON r ang ginagamit - Same participants
Parallel form
40
Ang ginagamit ay PEARSON'S r • Ang magiging result is ____
Test retest reliability Coefficient of stability
41
Correlate between items - ginagamit sa test development - Ex: Anxiety scale (Question 1 correlate to 2, until 50) - CRONBACH'S ALPHA ang statistical tool - MCDONALD'S OMEGA (Usually ang ginagamit sa school instead Cronbach's) - Ginagamit if all item is intended to the construct being measured - Pwede konti basta hindi bababa ng 3 questions
Internal consistencies
42
Create a test and divide into 2 half (Divide equally) - MAGKAIBA ang participants - ODD-EVEN-SCHEME ang way ng pagbilang - SPEARMAN RHO ang ginagamit
Split half reliability
43
Ginagamit kapag ang test is may same difficulty level (Bihira magamit) - Kapag mababa ang ITEM DIFFICULTY INDEX = Mahirap ang test - kapag mataas = Madali ang test
Kr21
44
Kapag 2 lang ang raters
Cohen's Kappa