Week 2 Flashcards
What is a raw score?
The score on an individual test -
not individually meaningful
needs to be compared to a standard (norm referencing)
What is norm referencing?
Relating a raw score to a standard to give it meaning.
Eg: a parent giving meaning to their child’s spelling test score. If the child scores 55, that may not seem good, but if the majority of children scored less than 30 or more than 70.
Criterion referencing
reference to a standard
a way of giving meaning to a test score by specifying the standard that needs to be reached in relation to a limited set of behaviours
Non-linear transformation
a transformation that preserves the order but not the equivalence of distance of the original scores
Most common form of non-linear transformation is the percentile
Derived score
A derived score is a numerical description of an individual’s performance in terms of norms
Standard score (z-score)
the distance of a score in a normal distribution from the mean expressed as a ratio of the SD of the distribution
The z-score is a standard score, which transforms the score into SD units. If a z-score is -3.2, we know it sits within the third SD away from the mean.
Percentile
an expression of the position (rank) of a score in a distribution of all scores by dividing the distribution into 100 equal parts; also known as ‘centile’
Calculation: number of values below the raw score, divided by the number of all raw scores, multiplied by 100.
Z-score
a linear transformation of test scores that expresses the distance of each score from the mean of the distribution of scores in units of the SD of the distribution
To calculate, take the mean from the raw score and divide by SD.
Typically good to use because it retains all the features of the raw score
Linear transformation
a transformation that original set of scores preserves the order and equivalence of distance of the original set of scores. Another way to think about this is adding a constant to all the raw scores. Any operation can be performed , so long as the straight line relationship is preserved i.e. +100 then /2 to all raw scores
Most common form of linear transformation is the z-score (SD)
Norm referencing
compared to a representative sample
a way of giving meaning to a test score by relating it to the performance of an appropriate reference group for the person e.g. class results
Deviation IQ
a method that allows an individual’s score to be compared with same-age peers; the score is reported as distance from the M in SD units.
Used in the WAIS-IV.
Sten score
a point on a scale that has 5 units above and 5 units below the mean, which is set at 5.5 with a standard deviation of 2
Cattel used this in the 16PF
T-score
a score standardised to a distribution with a mean of 50 and a standard deviation of 10
Used in MMPI where there are no wrong answers - answers either indicate the trait or they do not
Derived score (or norms)
Allows us to ascertain an individual’s position relative to a standardisation (or normative) sample
Provide comparable measures that permit a comparison across different tests.
Percentile scores
Percentage of people in the standardisation sample who fall below a particular raw score.
Advantages: easy to compute, readily understood, universally applicable
Disadvantages: inequality of units
Standard scores
To compare individuals who have taken the same test
To compare scores across different tests with different distributions
Normalised standard scores
Linear standard scores will be comparable only if they come from similar distributions
To ensure comparability > normalise
Norms
Specific to the population from which they are derived
Often tests come from samples of WEIRD populations
The responsibility is to the psychologist to understand the sample and interpret test scores in light of any limitations.
What are the three factors that determine objectivity of a test?
Administration (same materials, instructions, time limits)
Scoring (templates provided, computerised etc.)
Interpretation (directions about how to interpret scores)
A test will manual will typically specify the directions to ensure uniformity in administration.
What are the two criteria of a good test?
Reliability (consistency and/or dependability) and
Validity (accuracy or that the test will measure what it says it will)
consistency on its own is not sufficient to ensure accuracy
Correlation
The relationship between two variables.
co-efficient is the degree of linear relationship between -1 and +1
Correlation Co-efficient
Degree of linear relationship between two variables
rxy = Σ(zx zy)/N
Reliability co-efficient
The correlation between scores (rxx) on two administrations of a test
The extent to which scores on one test administration will generalise to other administrations of the same test.
The reliability co-efficient shows the proportion of raw score variance (total variance) explained by true score variance i.e. if rxx = .8, then 80% of the raw score variance is due to true score variance and 20% is due to error variance
True Score Theory (Spearman, 1904)
If the same test is given to an individual an infinite number of times, the obtained scores will be normally distributed.
the mean of that distribution is the true score
the SD of that distribution is the standard error of measurement.
True score reliability
Obtained (raw) score = true score + error variance
What are the types of reliability?
Test-retest
Alternate forms
Internal consistency > split half > Kuder-Richardson and Co-efficient-Alpha
Scorer (or interviewer)
Test-retest reliability
The same test is given to the same group of people on two different occasions
The rxx is the correlation between the scores on the two different occasions (coefficient of stability)
error variance (1-rxx) is attributable to changes in testing conditions and test-takers between the two occasions (sampling)
The longer the time between the two tests, the lower the reliability (allows more room for uncontrollable changes
Alternate forms
Can be immediate or delayed.
Two versions of the test, constructed in an identical way with different content, completed by the same group of people.
rxx is the correlation between the scores on the two forms of the test (co-efficient of equivalence)
error variance (1-rxx) is attributable to differences in content and time (when delayed).
Internal consistency and split half reliability
a single test, split into two halves is administered
rxx is the correlation between the person’s scores on each half of the test (co-efficient of consistency).
error variance (1 - rxx) is due to content.
Limitations -
Deciding the split
Speeded tests
Correlating the two halves (shortening the test)
Spearman-Brown Formula
Looks at the effect of reliability of lengthening or shortening the test
Longer tests tend to be more reliable.
Estimates the length of a test needs to be to obtain desired level of reliability
Kuder Richardson 20
Uses only dichotomously scored items i.e. true/false
Coefficient Alpha
Used for multiple choice items and items rated on scale with more than two options
Inter-rater reliability
Two scorers (raters) provide scores for an individual on a test
Used when the scores on the test require subjectivity on the part of the scorer