Psychometrics in Neuropsychological Assessment (Strauss, 2006) Flashcards
How do you calculate a z-score?
obtained score - sample mean/ sample SD
What is the mean and SD of a Z-score?
Mean = 0 SD = 1
What is a Z-score?
A type of standard score
What does a Z-score quantify?
How many SDs a score is from the mean
What is a t-score?
Another linear transformation of a raw score
What is the mean and SD of a t-score?
Mean = 50 SD = 10
Why do we use standard scores?
By virtue of conversion to a common metric, they facilitate the comparison of scores across measures
What must the distribution of the tests be for us to use standardized scores?
Approximately normal
What are two things that must be considered before comparing test scores?
- The reliability of the 2 measures
2. Their intercorrelation
How can you calculate the prevalence value from a z-score?
- Look up the corresponding estimated frequency (e.g., -4) in a z-score table
- Divide 1 by that value (e.g., .00003/1 = 31,560)
Thus, the estimated prevalence of -4 is 1 in 31,560
A test with a normal distribution in the general population may show extreme skew or other divergence from normality when administered to a population that differs considerably from the average individual. Give an example.
Vocab test being negatively skewed when administered to doctoral students in literature vs positively skewed when given to preschoolars who recently immigrated
When a new test is constructed, how can non-normality be corrected?
By examining the distribution of scores on the prototype test, adjusting test properties and resampling until a normal distribution is reached
What is another way of saying negatively skewed (in terms of testing)?
low ceiling
What is another way of saying positively skewed (in terms of testing)?
high ceiling
Will a large N correct for non-normality of an underlying population distribution?
No - a larger sample will only produce a more normal distribution if the underlying population distribution from which the sample was obtained is normal
What factors may lead to non-normal test score distributions?
- The existence of discrete subpopulations within the general population with differing abilities
- Ceiling or floor effects
- Treatment effects that change the location of means, medians and modes and affect variability and distribution shape
Small samples may yield non-normal distributions due to what?
Random sampling effects
What is a formal measure of asymmetry?
Skewness
What is the skew value of a true normal distribution?
0
What will have a skew value NEAR 0?
A non-normal but symmetric distribution
What do negative skew values indicate?
That the left tail of the distribution is heavier than the right
What does skewness tell us about the mean and median?
If there is skewness then the mean and median are not identical because the mean will not be at midpoint in rank
Z-scores will not accurately translate into sample percentile rank values
What increases as skew increases?
Error in mapping z-scores to sample percentile ranks
What kind of distributions often have significant skew? Give an example.
Truncated distributions often have significant skew
Truncated distributions often occur when range is restricted: e.g., reaction time
Floor and ceiling effects may be defined as what?
The presence of truncated tails in the context of limitations in range of item difficulty
What does a high floor mean?
All items difficult
What does a low ceiling mean?
All items easy
What does multimodality refer to?
The presence of more than one peak in a frequency distribution
Non-normality has major implications for interpreting and comparing standard scores - elaborate.
Standardized scores derived by linear transformation will not correspond to sample percentiles and the degree of divergence can be quite high
Normalizing transformations introduce error - when is it acceptable to normalize scores?
1) If they come from a large and representative sample
2) If any deviation from normality arises from defects in the test rather than characteristics of the sample
Normalizing transformations introduce error, what is the preferable way to handle non-normality in scores?
Preferable to adjust score distributions prior to normalization by modifying test content rather than statistically transforming non-normal scores into a normal distribution
Define reliability
The consistency of measurement of a given test and can be defined in several ways, including consistency within itself (internal consistency reliability) and consistency over time (test-retest reliability)
What do indices of reliability show?
The degree to which a test is free from measurement error
Internal consistency reliability
The tests reliability with itself
Reliability coefficients are influenced by what two things?
- Test characteristics
Sample characteristics
Reliability coefficients are influenced by test characteristics - list some examples of test characteristics
Length, item type, item homogeneity, influence of guessing
Reliability coefficients are influenced by sample characteristics - list some examples of sample characteristics
Sample size, range, variability
Test clarity is closely related to reliability. What is meant by test clarity?
- Clearly written
- Easily understood instructions
- Standardized administration conditions
- Exploring scoring rules that minimize subjectivity
A process for training raters
What is internal reliability?
The extent to which items within a test measure the same cognitive domain or construct
A measure of the inter-correlation of items, internal reliability is an estimate of the correlation between randomly parallel test forms, and by extension, correlation between test scores and true scores.
Describe and provide examples of the major sources of measurement error
Time sampling error- is associated with the fluctuation in test scores obtained from repeated testing of the same individual.
Content-Sampling Error- is the term used to label the error that results from selecting test items that inadequately cover the content area that the test is supposed to evaluate.
Also: Quality of test items, test length, test-taker variables, and lastly the test administration.
Do longer or shorter tests yield higher reliability estimates
Longer
What is split half reliability
Correlating two halves of items from the same test - a type of internal reliability
What is another term for split half reliability?
Spearman-brown reliability coefficient
What is the error variance associated with split-half reliability?
Content sampling
What is the Kuder-Richardson reliability coefficient?
A type of internal reliability - used for yes/no items or heterogeneous tests where split half is necessary
What is the error variance associated with Kuder-Richardson?
A type of internal reliability - used for yes/no items or heterogeneous tests where split half is necessary
What is the coefficient alpha?
General estimate of reliability based on all the possible ways of splitting up test items - an internal reliability coefficient
What are the 3 internal reliability coefficients?
- Split-half or Spearman-brown
- Coefficient alpha
Kuder-Richardson
What is the type of error variance associated with test-retest reliability?
Time sampling
What are the advantages, in terms of reliability, of speed tests?
- Very high internal reliability estimates
Reliabilities can also be divided into specific time intervals
What is another term for test-retest reliability?
Temporal stability
What is test-retest reliability?
An estimate of the correlation between two test scores from the same test administered at 2 different times
What are you looking for in test-retest reliability?
Little change over time - this shows that there are no differential effects of prior exposure
What kind of tests will show lower test-retest reliability? What kind will show higher?
Tests measuring dynamic (changeable) abilities will show lower test-retest reliability than tests measuring more trait-like or stable abilities
What kind of situational variables can impact an individual’s test score over time on the same test?
State, examiner state, examiner identity, environmental conditions
What are 3 sources of bias in test-retest situations?
- Intervening variables (e.g., surgery)
- Practice effects
- Demographic considerations (Age, education etc.)
What are two types of statistical error associated with test-retest reliability?
Measurement error and regression to the mean
What is the type of coefficient associated with test-retest reliability?
stability coefficient
What is the purpose alternate forms reliability?
It attempts to eliminate confounding effects of practice when a test has to be administered more than once
Types of error associated with alternate forms reliability
Time sampling error and content sampling error
Why do alternate forms not necessarily eliminate effects of prior exposure?
Because exposure to stimuli and procedures can introduce carry over effects too
(these effects are less for tests assessing acquired knowledge)
What is inter-rater reliability?
Reliability of administration and scoring
How do you evaluate inter-rater reliability?
- Percentage agreement
- Kappa
- Product-moment correlation
What is Generalizability/G-theory?
Reliability is evaluated by decomposing test score variance into between- and within-group variance using the general linear model
What is between groups variance considered to be an estimate of?
True score variance
What is within group variance considered to be an estimate of?
Error variance
What is the coefficient associated with G-theory?
Generalizability coefficient - the ratio of estimated true variance to the sum of the estimates of true variance and estimated error variance
Does high reliability indicate high validity?
No
Will reliability stay the same across populations?
No - it may vary
Can a test have a single level of reliability?
No - it is composed of several different kinds - the importance of each type depends on the test
What is more important - reliability or validity?
Both important, but usually preferable to choose a test of slightly lesser reliability if it can be demonstrated that the test is associated with a meaningfully higher level of validity
What does high internal consistency indicate?
That the test is measuring a single construct
What kind of reliability is considered essential?
High test-retest reliability is considered essential unless the test is measuring state variables that are expected to fluctuate
How do you interpret a reliability coefficient?
A reliability coefficient can be interpreted directly in terms of the percentage of score variance attributed to different scores
How do you estimate true score variance?
By knowing all the sources of variance (test-retest, alternate form, inter-rater) then you can estimate true score variance
What are the 5 categories of reliability coefficient magnitudes?
- Very high (.90+)
- High (.80-.89)
- Adequate (.70-.79)
- Marginal (.60-.69)
- Low (
What is a low magnitude reliability coefficient?
What is a marginal reliability coefficient?
.60-.69
What is an adequate reliability coefficient?
.70-.79
What is a high reliability coefficient?
.80-.89
What is a very high reliability coefficient?
.90+
What is the important limitation of reliability coefficients?
They do not provide complete information on the reproducibility of individual test scores
What is a true score?
The score an examinee would obtain on a measure in the absence of any measurement error - these are never known, but rather estimated
Conceptually defined as the mean score an examinee would obtain across an infinite number of randomly parallel forms of a test
Obtained scores
The actual score yielded by the test
Sum of true scores and error
Estimated true scores
The sum of the mean score of the group and the deviation of the person’s score from the normative mean weighted by test reliability
As test reliability approaches unity (r=1.0) estimated true scores approach X
Obtained scores
As test reliability is close to 0, estimated true scores approach X
the mean test scores
In the SEM formula what is the kind of reliability it is looking for?
reliability coefficient of the test (usually internal reliability)
What is validity?
the degree to which a test actually measures what it is intended to measure
What are the three main types of validity?
- Content validity
- Criterion validity
- Construct validity
What are two sub-types of content validity?
- Divergent
2. Convergent
What are two sub-types of criterion validity?
- Concurrent
2. Predictive
How do you evaluate construct validity?
Correlation with other tests, factor analysis, internal consistency, convergent and discriminant validation, structural equation modelling
What is test sensitivity?
The proportion of COI+ examinees who are correctly identified as such by a test
(high sensitivity tests yields high levels of correct positives)
What is test specificity?
The proportion of COI- examinees who are correctly identified as such by a test
(high specificity test yields high levels of correct rejections)
What is positive predictive power?
The probability that someone with a positive test result has the COI
The likelihood that a profile of test scores will exceed criteria for abnormality increases as:
- The number of tests in a battery increases
- The z score cutoff used to classify a test score as abnormal decreases
- The number of abnormal test scores required to reach criteria decreases