- the desired consistency or reproductibility of test scores (does it give me the same accurate measurement each time it is used?) - no test is free from error

Psychometrics: reliability Flashcards by Abbie Chetwin

What is a reliable test?

consistency in measurement

- the precision with which the test score measures achievement

How well did you know this?

Not at all

Perfectly

What is reliability

the desired consistency or reproductibility of test scores (does it give me the same accurate measurement each time it is used?)
no test is free from error

How well did you know this?

Not at all

Perfectly

Reliability formula

x=T+e

x- the observed score
T- the true score
e- the error

How well did you know this?

Not at all

Perfectly

The Four Assumptions of Classical Test Score Theory

Each person has a true score we could obtain if there was no measurement error
there is measurement error- but this error is random
the true score of an individual doesnt change with repeated applications of the same test, even though their observed score does
the distribution of random errors and thus observed test scores with be the same for all people

How well did you know this?

Not at all

Perfectly

Standard Error of measurement (SEM)

-works out how much measurement error we have by working out how much on average, an observed score on our test differs from the true score
(standard deviation)

How well did you know this?

Not at all

Perfectly

Problems with Classical Test Score Theory

Population dependent
Test dependent
Assumption of equal error measurement

How well did you know this?

Not at all

Perfectly

Domain Sampling Model

a central concept of Classical Test Theory
cant ask all possible questions on a test so only use a few test items (sample)
using fewer test items can lead to the intro of error
as sample gets larger, estimate is more accurate

How well did you know this?

Not at all

Perfectly

4 Types of reliability

Test-retest reliability
Parallel forms reliability
Internal consistency
inter-rater reliability

How well did you know this?

Not at all

Perfectly

Test-retest reliability

give someone a test and then give them the same test later on
if scores are highly correlated, we have a good test-retest reliability
correlation between 2 scores = co-efficient of stability
time sampling

How well did you know this?

Not at all

Perfectly

Issues with test-retest

. can it be used when measuring mood/stress?
scores increase because done them before
if thing being measure changes?
what if an event happens between tests administrations to change the thing being tested?

How well did you know this?

Not at all

Perfectly

Parallel forms reliability

2 forms of the same test (questionnaire with different items)
correlation between the two = co-efficient of equivalence
item sampling

How well did you know this?

Not at all

Perfectly

Ways to change test in parallel forms reliability

question response alternatives are reworded
order is changed
change wording of question

How well did you know this?

Not at all

Perfectly

Issues with parallel forms reliability

what if different forms are given at two different times?
do you give the form to the same or different people?
what if people work out how to answer the one form from doing the other form?
do you have two forms of the test and/or do we want to develop two forms of the same test?

How well did you know this?

Not at all

Perfectly

Internal Consistency

-do different items within a test all measure the same thing, to an extent?

How well did you know this?

Not at all

Perfectly

Examples of internal consistency tests

split-half reliability
KR20
coefficient alpha

How well did you know this?

Not at all

Perfectly

Split-half reliability

Study These Flashcards

test split in half and each half scores separately

- total scores for each half are correlated

advantage of split-half reliability

Study These Flashcards

-only need one test (dont need 2 forms)

challenge of split-half reliability

Study These Flashcards

-how to divide the test into equivalent halves

issues with split-half reliability

Study These Flashcards

by splitting test, have less items and the lower the reliability will be
correlation changes each time depending how items are split

Spearman-Brown formula

Study These Flashcards

is the solution to the problem for split tests- that each half will have reduced reliability compared to the total test)

Coefficient/Cronbach’s Alpha

Study These Flashcards

estimates the consistency of responses to different scale items
takes the average of all possible split-half correlations for a test

What do coefficients results mean? Cronbach’s A

Study These Flashcards

0- no consistency in measurement

1- perfect consistency in measurement

what level of reliability is appropriate? Cronbach’s A

Study These Flashcards

7 - exploratory research
8 - basic research
9 - applied scenarios

Cronbach’s alpha can be affected by

Study These Flashcards

multidimensionality
bad test items
number of items

Inter-rater reliability

- measures how consistently 2 or more judges agree on rating something - by correlating raters scores

Cohen's kappa

-2 judges/raters -ranges from 1 (perfect agreement) to -1 (agreement less than would be expected by chance) >0.75 - excellent agreement 0.4-0.7 - satisfactory

Fleiss' kappa

for 2 or more judges/raters

Intra-class correlation (ICC)

used for inter-rater reliability when rating interval and ordinal measurements

ICC vs COhen/Fleiss kappa

- ICC for continuous data (interval and ordinal) | - kappa for observations in a category (nominal/categorical data)

SEM calculation

SEM= S(sqrt 1-r) s- stdev r- reliability of test

confidence intervals using SEM

- z score for 95% confidence interval= 1.96 - lower bound = x- 1.96*SEM - upper bound =x+ 1.96*SEM

Factors influencing reliability

1. number of items in scale 2. variability of the sample (better with wider population) 3. extraneous variables (testing situation, ambiguous items, unstandardised procedures, perceived demand effect)

how to improve reliability

2. item analysis 3. Use identical instructions 4. Eliminate questions that evoke inconsistent responses 5. Cover entire range of the dimension 6. Clear conceptualization 7. Standardization 8. Inter-rater training 9. Use more precise measurement 10. Use multiple indicators 11. Pilot-testing

Psychometrics: reliability Flashcards

(33 cards)