Module 2: Norms and Reliability Flashcards

Question 1

Q

What is Classical Test Theory? (CCT)

Answer

A

CCT is a model for understanding measurement
CCT is based on the True Score Model…

… for each person, their observed score on a test is comprised of: -	Observed score (X) = True Score (T) + Error (E)

Question 2

Q

What is a true score?

Answer

A

True score is a person’s actual true ability level (i.e. measured without error).

Question 3

Q

What is error?

Answer

A

Error is a component of observed score unrelated to the test takers rue ability or trait being measured.

True variance and Error variance thus refer to the variability in a collection/population of test scores.

Question 4

Q

What is reliability?

Answer

A

Reliability refers to consistency in measurement.

- According to CCT: reliability is the proportion of the total variance attributed to true variance

Question 5

Q

What is test administration error?

Answer

A

Test administration: variation due to the testing environment

Testtaker variables (e.g., arousal, stress, physical discomfort, lack of sleep, drugs, medication)
Examiner variables (e.g., physical appearance, demeanour)

Question 6

Q

What is test scoring and interpretation error?

Answer

A

Test scoring and interpretation:

Variation due to differences in scoring and interpretation

Question 7

Q

What are methodological errors?

Answer

A

Variation due to poor training, unstandardized administration, unclear questions, biased questions.

Question 8

Q

CCT True-score Model vs. Alternative

Answer

A

True Score Model of measurement (based on CCT) is simple, intuitive, and thus widely used
Another widely used model of measurement is Item Response Theory (IRT)
CTT assumptions more readily met than IRT, and assures only two components to measurement
But, CTT assumes all items on a test have an equal ability to measure the underlying construct of interest.

Question 9

Q

Item Response Theory (IRT)

Answer

A

IRT provides a way to model the probability that a person with X ability level will correctly answer a question that is ‘tuned’ to that ability level.

Question 10

Q

What does IRT incorporate and consider?

Answer

A

IRT incorporates considerations of item Difficulty and discrimination
o Difficulty relates to an item not being easily accomplished, solved, or comprehended.
o Discrimination refers to the degree to which and item differentiates among people with higher or lower levels of the trail ability, or construct being measures.

Question 11

Q

Reliability estimates

Answer

A

Because a person’s true score is unknown, we use different mathematical methods to estimate the reliability of tests.

Common examples include: -	Test-retest reliability -	Parallel an Alternate forms of reliability  -	Internal consistency reliability o	E.g., split in half, inter item correlation, Cronbach’s alpha -	Interrater/interscorer reliability

Question 12

Q

Test-retest reliability

Answer

A

Test-retest reliability is an estimate of reliability over time

Obtained by correlating pairs of scores from same people on administration and same test at different times
Appropriate for stable variables (e.g., personality)
Estimates tend to decrease as time passes

Question 13

Q

Parallel and Alternate Forms Reliability

Answer

A

Parallel forms: two versions of a test are parallel if in bother versions the means and variances of test scores are equal
Alternate forms: there is an attempt to create two forms of a test, but they do not meet strict requirement of parallel forms
Obtained by correlating the scores of the same people measured with the different forms.

Question 14

Q

Split half reliability

Answer

A

Obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.

Entails three steps:

Step 1: Divide the test into two halves
Step 2: Correlate scores on the two halves of the test.
Step 3: Generalise the half-test reliability to the full-test reliability using the Spearman-Brown formula.

Question 15

Q

Inter-item/correlating

Answer

A

The degree of relatedness of items on a test. Able to gauge the homogeneity of a test

Question 16

Q

Inter-item/correlating

Answer

A

The degree of relatedness of items on a test. Able to gauge the homogeneity of a test

Question 17

Q

Kuder-Richardson formula 20

Answer

A

statistic of choice for determining the inter-item consistency of dichotomous items

Question 18

Q

Coefficient alpha

Answer

A

mean of all possible split-half correlations, corrected by the Spearman-brown formula. The most popular approach for internal consistency. Values range from 0 to 1.

Question 19

Q

Interrater/InterScorer Reliability

Answer

A

The degree of agreement/consistency between two or more scorers (or judges or raters).

Often used with behavioural measures
Guards against biases or idiosyncrasies in scoring
Obtained by correlating scores from different raters:
o Use intraclass correlation for continuous measures
o Use Cohen’s Kappa for categorical measures

Question 20

Q

Choosing Reliability Estimates

Answer

A

The nature of the test will often determine the reliability metric e.g.,

Are the test items homogenous or heterogeneous in nature
Is the characteristic, ability, trait being measured presumed to be dynamic or static
The range of test scores is or is not restricted
The test is a speed (how many can you do in a certain amount of time) or a power test (increasing difficulty over the item)
The test is or is not criterion-referenced (in order to pass you need to reach a threshold)

Otherwise, you can select whatever you think is appropriate.

Question 21

Q

How do we account for reliability in a single score?

Answer

A

Our reliability coefficient tells us about error in our test in general
We can use this reliability to estimate to understand how confident we can be in a single observed score for one person.

Question 22

Q

Standard Error of the Difference (SED)

Answer

A

The SED is a measure of how large a difference in test scores would be to be considered ‘statistically significant’

Helps with three questions (Note: test 1&2 must be on the same scale)

How did Person A’s performance on test 1 compare with own performance on test 2?
How did Person A’s performance on test 1 compare with Person B’s performance on test 1?
How did Person A’s performance on test 1 compare with person B’s performance on test 2?

Question 23

Q

Standardization

Answer

A

is the process of administering tests to representative samples to establish norms.

Question 24

Q

Sampling

Answer

A

the selection of an intended population for the test, that has at least one common, observable characteristic.

Question 25

Q

Stratified-random sampling

Answer

A

is a sampling design that ensures every member of a population has an equal opportunity of being included in a sample.

Question 26

Q

Purposive sample

Answer

A

is arbitrarily selecting a sample believed to be representative of the population.

Question 27

Q

Incidental/convenience

Answer

A

sample that is convenient or available for use. May not be representative of the population.
o Generalisation of findings from convenience samples must be made with caution.

Question 28

Q

Process of developing norms:

Answer

A

Have obtained the normative sample:

Administer the test with standard set of instructions
Recommend a setting for test administration
Collect and analyse data
Summarize data using descriptive statistics including measures of central tendency and variability
Provide a detailed description of the standardization and administration protocol

Question 29

Q

Types of Norms

Answer

A

Percentiles: the percentage of people in the normative sample whose score was below a particular raw score.

Percentiles are popular because they are easily calculated and interpreted.
Problem: real differences between raw scores may be minimized near ends of distribution and exaggerated in the middle of the distribution.

Age norms: average performance of normative sample segmented by age.

Grade norms: average performance of normative sample segmented by grade.

Subgroup: a normative sample can be segmented by any criteria initially used in selecting sample.

National norms: derived from normative sample that was nationally representative of the population.

National anchor norms: equivalency table for scores on two different tests. Allows common comparison.

Local norms: provide normative information with respect to the local populations performance on some test.

Question 30

Q

The normal curve

Answer

A

The normal curve is a bell-shaped, smooth, mathematically defined curve t

Question 31

Q

Standard Scores

Answer

A

Standard score: is a raw score converted from one scale to another that has a predefined scale (i.e., set mean and standard deviation)

Question 32

Q

Z-score

Answer

A

Z-Score: conversion of a raw score into a number indicating how many standard deviation units the raw score is below or above the mean

Question 33

Q

T-scores

Answer

A

T-Scores: aka ‘fifty plus or minus ten scale’ – scale has set mean = 50 and standard deviation = 10

Question 34

Q

Culture and Inference

Answer

A

In selecting a test for use, responsible test users should research all available norms to check if norms are appropriate for use with your patient
When interpreting test results it helps to know about the culture and era of test-taker
It is important conduct culturally informed assessment