Ch 3 Flashcards

1
Q

Define Reliability

A

The consistency or reproduce-ability of test scores or assessment data

When an assessment is reliable, this means that my results are dependable and meaningful

Also: the ability to test scores to be interpreted in a consistent and dependable manner across multiple administrator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Caveats when considering the reliability of an instrument

A

1) reliability scores refer to the RESULTS produced not the test itself
2) Just because an instrument is ONE type of reliable, doesn’t mean that it is all types of reliable
3) Results from tests are rarely consistent all of the time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Classical Test Theory

A

aka (TRUE SCORE MODEL)

X =T + E
(Achieved score = true score (raw score) plus random error)

-Model can be used to test the reliability, difficulty and discriminatory properties of test items/scales

As test administrators: out responsibility to limit measurement error as best we can

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two types of measurement error?

A
  1. Systematic - when test consistently measures something other that what it’s supposed to (aka imperfect construct validity) e.g. computer skills interfering with assessing math skills
  2. Unsystematic - aka random error - collection of factors that contribute to variation in scores including test construction, administration, scoring
    - could also be related to individual characteristics of test-taker
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the sources of measurement error?

A

CATT (Cats can measure)
(Content Administration Time Test-Taker)

Time Sampling Error (3)
Content Sampling Error
Test Administration Error
Test Taker Variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Define Time Sampling Error

(source of measurement error)

A

Results from repeated administrations of test to same person

Will largely depend on construct being assessed:

  • personality = stable
  • emotional state = more variable
  • CONSIDER how likely the construct is to vary naturally
  • CONSIDER time interval between administrations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What three factors may impact time sampling error?

A
  1. Carryover effect: when score on first admin impacts scores on subsequent administrations
  2. Practice effect: when scores improve because test-taker become more familiar/comfortable with content being assessed
  3. Fatigue: performance may decrease as taker becomes tired of repeated testing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Content Sampling Error

Source of measurement error

A

Aka Domain Sampling error

When test items don’t fully reflect the construct being measured

  • really hard to capture all the constructs the test was designed to capture
  • MOST COMMON source of error observed in tests scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Test Administration Error

Source of measurement error

A
  • deviation from test protocol

- unforseen events that occur within the testting environment e.g. power outage, fire drill, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Test Taker Variables

sources of measurement error

A

Individual difference in test takers that can’t be accounted for by administrator
Can include motivation, fatigue, anxiety, ability, illness, mood, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is reliability measured?

A

The Correlation Coefficient

This number indicates the strength of the relationship between the the variance of the true scores and the variance of observed score

V of Tscore/V of Oscore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you interpret a correlation coeffecient score?

measure of reliability

A

.83

–> 83% of score = variance of REAL (TRUE) score, not measurement error

-1 = Perfect negative correlation
0 = no correlation
\+1 = Perfect positive correlation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Is -.95 or + .85 a stronger correlation?

A

-.95

Remaining percentage = error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How is reliability assessed/estimated?

A

a) Test-retest
b) Alternate forms
c) Internal consistency
d) inter-rater reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Test-Retest Reliability

way to estimate reliability

A

**MOST COMMON

Used to assess how stable/reliable a score is over time
Set of participants is tested using the SAME test on two separate occasions

*Carryover effect can be significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Alternate Forms Reliability

way to estimate reliability

A

Uses different but equivalent versions of a test

Also called Item sampling (all items are from the same poll of questions)

Forms should have similar means, variances, item difficulty and correlations with other measures

Can be a good way to overcome limitations of test-retest analysis

17
Q

Internal Consistency

way to estimate reliability

A

Goal: to see if items in a test are consistent with each other; do they represent a singular construct?

  1. Split-half reliability
  2. Kurder-Richardson
  3. Cronbach’s alpha
18
Q

Split-half reliability

way to measure internal consistency

A

Divide test into two comparable halfs

Calculate the correlation between the result of the two halves using:

Spearman-Brown prophecy formula

SPLIT halve= SPEARMAN -brown

19
Q

Kurder-Richardson formulas

internal consistency measure

A

-Uses a statistical process to determine split-half reliability

KR20-actual scores on each item
KR21- mean scores on each item

Can only be used with dichotomous response sets (T/F)

20
Q

Cronbach’s alpha

A

Also: Coefficient Alpha

Can be used with Likert (nondictomous data)

21
Q

Inter-rater reliability

A

Assess reliability of test-administrators

Level of agreement/level of disagreement

22
Q

How to interpret a reliability coeffecient

A

All the methods of estimating reliability will give you a reliability coefficient between -1 to +1

Best: .90 Acceptable .80

Very high: .90
High: .80
Acceptable: .70
Questionable: 0.60
Unacceptable: .less than .59
23
Q

What is Standard Error or Standard Error of Measurement?

A

The standard deviation of a normal curve

True score of any test is likely to fall between -2 and +2 SEM (95%) Confidence interval

The smaller the SEM ->less variance in score->higher degree of reliability

Inverse relationships between SEM and reliability

24
Q

How to increase reliability of a test

4

A

Lo PH
Length Optimal time Population Heterogenity

(Improve reliability and reduce measurement error)

  1. increase test length
    - more questions = more reliable
    - increases internal consistency (as long as good questions are added)
  2. Make sure test is designed for population you want to use it with
    - age, vocabulary, education level
  3. Increase heterogenity
    - the more similar test takers are, the more similar the test scores will be
  4. Use Optimal Time interval between tests
    - time interval between tests plays a huge role in how reliable results are