WK 2 Norms and Reliability Flashcards
Norms and Reliability
Describe the main premise of classical test theory and how it relates to reliability?
CTT says that every person’s observed score is made up of the true score (of the trait) as well as partly error. For a population, the total variance is the true variance + the error variance. Reliability refers to the proportion of true variance divided by the total variance. That is, reliability is directly influenced by true variance, but note - we can only ever estimate the true variance.
Describe measurement error?
Measurement error is also known as error variance. It is made up of both systematic error (predictable and constant) and random error (unpredictable, unrelated, noise). Random error is good because it should balance out in the end and result in a similar mean. Systematic is less good, but if you know what could be affecting it, you can adjust your numbers accordingly.
List common sources of measurement error
- Test Construction
- Test Administration
- Test Scoring and Interpretation
- Sampling Error
- Methodological Errors
Describe Test Construction error
Variation due to differences in items on same test or between tests
Describe Test Administration error
Variation due to testing environment
(test-taker: anxiety, stress, drugs, sleep, physical discomfort)
(Examiner: appearance, demeanour)
Describe Test Scoring and Interpretation error
Variation due to scoring and interpretation e.g. scoring a video on warmth behaviours of a mother towards aggressive child
Describe Sampling Error
Variation due to representativeness of sample e.g. doesn’t gather sample that represents a population, instead only educated people
Describe Methodological Errors
Variation due to poor training, unstandardised administration, unclear questions, biased questions
What is the difference between CCT and IRT?
CTT assumes just two components to measurement and that all items have equal ability to measure the target in question.
IRT is very powerful in understanding the power of an item in finding latent traits, it examines items specifically and can reveal different levels of the latent trait being exmained
IRT incorporates considerations of item difficulty and discrimination. Can you describe what they mean in the context of IRT?
Difficulty relates to the ability of an item to be completed, solved or comprehended
Discrimination refers to the degree to which an item differentiates between high and low levels of the construct. E.g. if the discrimination slope is steep, it is good at discriminating between different levels
List the common estimates of reliability.
- Test-retest reliability
- Parallel and Alternate Forms Reliability
- Internal consistency reliability (split-half, inter item correlation, Cronbach’s alpha)
- Inter-rater/ inter-scorer reliability
Describe Test-retest reliability
Estimate of reliability over time/ the consistency of a test over time
How? Correlate Pairs of scores from the same people, on the same test, at different time points
Good for? Stable variables e.g. Personality
Bad? Estimates tend to decrease as time passes
Not good for fluctuating variables e.g. Mood
Describe Parallel and Alternate Forms Reliability
if the MEANS and VARIANCE are equal in both versions of a test = PARALLEL
If not = ALTERNATE
How? Correlate the scores of the same people measured by the different forms
E.g. Does cognitive function improve over time: use the Montreal Cognitive Assessment (MOCA): two different versions: Patient can’t use answers from first version to help them in second
Describe split-half (internal consistency)
How? Correlate equivalent halves of the one test with each other, then generalise the half-test reliability to the full-test internal consistency reliability Spearman-Brown Formula By changing the ‘n’ of your final test, you can manipulate the reliability of your test.
S-B predicted reliability = (nhalf-correlation)/ 1 + (n-1) half-correlation
Describe inter-item consistency/ correlation (internal consistency)
the degree of relatedness of items on a test. HOMOGENEITY. Basically you get the average of inter-item correlations
Describe Kuder-Richardson Formula 20 (internal consistency)
Statistic of choice for determining the inter-item consistency of dichotomous (binary) items. I.e. yes/ no
Describe coefficient/ Cronbach’s Alpha (internal consistency).
You get the mean of all possible split-half correlations, corrected by S-B formula, very popular approach for internal consistency. Values 0-1
Describe Inter-rater/ inter-scorer reliability
Degree of agreement/ consistency between two or more scorers> correlate scores from different raters. Often used in behavioural measures. Aims to guard against biases or idiosyncrasies in scoring. Obtained by correlating scroes form different raters (INTRACLASS correlation for CONTINUOUS measures) allows you to adjust for systematic differences. (COHEN’S KAPPA for CATEGORICAL measures)
What should you consider when choosing reliability estimates?
- Homogenous/ Hetero
- Dynamic/ static (over time how will it change)
- Restricted/ not (restriction of or not enough restriction of range affects ur correlation)
- speed/ power test (speed likely to be homogenous, power, likely to be heterogenous)
- criterion-/ not criterion-referenced
Why might we want to consider reliability of a single test score?
For example in the clinical setting, we want to know is one person’s score taking our test. We can use our reliability coefficient and generalise to a single score.
How do you use reliability of tests to get precision?
- Standard Error of Measurement (SEM)
- estimates the single observed score and how close it is to the true score i.e. its’ precision/ amount of error
- generally, higher reliability, lower SEM
- Estimate the extent of deviation between observed and true score
SEM CHANGES BASED ON SD and RELIABILITY OF TEST - Standard Error of the Difference (SED)
- estimates the difference b/w 2 test scores and is it statistically significant
- MUST use standardised variables, i.e. compare the apples with the apples, or convert the oranges to apples
Explain the difference between norm-referenced and criterion-referenced tests?
Norm-referenced compares a single person’s test score to a normative sample. e.g. IQ test. Criterion-referenced tests compare a single person’s test score to a pre-determined standard criterion/ threshold. e.g. passing first aid course/ driving test
What does Standardisation in sampling to develop norms mean?
It means the process of administering test to representative sample to establish norms
What does SAMPLING in sampling to develop norms mean?
the selection of an intended population for the test, that has at least one observable characteristic