Week 8: Interpretation of test results Flashcards
A person’s raw score has little meaning without which two things?
- A comparison to a normative sample
- A method for interpreting the meaning of the comparison
What do we call a measure that can be used to compare values from different data sets?
Relative standing
What is it called when interpreting and communicating test performance depends on having an appropriate comparative sample and a common ‘language’ of descriptions?
Rule of thumb
What is a positively skewed distribution?
More scores fall below the mean compared to above the mean (left side higher and right side lower)
What is a negatively skewed distribution?
More scores fall above the mean compared to below the mean (left side lower and right side higher)
What happens when scores from a normative sample are not normally distributed (2)?
- The mean and median are not identical
- Z-scores will not accurately translate into sample percentile rank values
A … sample size will produce a … normal distribution, but only if the underlying characteristic in the population distribution obtained is normal
Larger, more
When can a truncated distribution (where the starting point is not 0) occur (2)?
- When scores are restricted at one side of the distribution
- When specific subgroups are purposefully excluded from inclusion in the normative sample
A truncated distribution of scores can lead to (3)?
- Identification of normal individuals as low functioning
- Difficulty estimating the severity of impaired performance
- An increase in number of persons identified as impaired
When is it useful to compare scores between tests (2)?
- The raw score distributions for tests that are being compared are approximately normal in the population
- The scores that are being compared are derived from similar samples
When comparing test scores, it is important to consider the … of two measures and their …
Reliability, intercorrelation
The relationship between normative scores and percentiles are lineair/non-lineair
Non-lineair
What is defined as the presence of truncated tails in the context of limitations in range of item difficulty?
Ceiling and floor effects
What does a high floor in scores mean?
When a large proportion of the examinees obtain raw scores at or near the lowest possible score
What indicates a high floor in test scores?
That the test lacks a sufficient number and range of easier items
Floor and ceiling effects can lead to?
Misinterpretations results
What does extrapolation entail?
The action of estimating or concluding something by assuming that existing trends will continue. When norms fall short in terms of rang this technique is often used
Comparison of performance across tests is affected by: (5)
- Measurement error
- Score magnitude
- Extreme scores
- Ceiling and floor effects
- Extrapolation/ interpolation of derived scores
It is important to carefully consider how to interpret isolated low scores. The likelihood of obtaining low scores increases when (3)?
- The number of tests increases
- The cut off for defining low scores becomes more open-minded
- With lower levels of baseline cognitive functioning
The degree of agreement between different people that are observing or assessing the same thing = (Inter-rater reliability/Test-retest reliability/Parallel-forms reliability/Internal consistency reliability)
Inter-rater reliability
Measure the consistency of the result when you repeat the measure the same thing at a different point of time = (Inter-rater reliability/Test-retest reliability/Parallel-forms reliability/Internal consistency reliability)
Test-retest reliability
Measures the correlation between two equivalent versions of a test. This can help to avoid practice effects, but the versions should be equivalent = (Inter-rater reliability/Test-retest reliability/Parallel-forms reliability/Internal consistency reliability)
Parallel-forms reliability
The correlation between items
within a test that are mean to measure the same construct = (Inter-rater reliability/Test-retest reliability/Parallel-forms reliability/Internal consistency reliability)
Internal consistency reliability
What is validity?
Validity is the degree to which a test is measuring what is was intended to measure
What is reliability?
The consistency of a measure (whether the results can be reproduced under the same conditions)
What is sensitivity?
Sensitivity is the probability of a positive test, given that the person is affected
What is specitifity?
The probability of a negative test, given that a person is healthy
What does a p-value NOT measure (3)?
- Does not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone
- They do not provide a good measure of evidence regarding a model of hypothesis
- They do not measure the size of an effect or the importance of a result
What does a p-value measure?
The probability that a statistical summary of the data would be as extreme (or more) than its observed value
What is circular analysis?
Circular analysis is any form of analysis that retrospectively selects features of the data to characterise the dependent variables, resulting in a distortion of the resulting statistical test
= based on data that was selected for showing the effect of interest or a related effect
What is p-hacking?
The misreporting of true effect sizes in published studies. It occurs when researchers try out several statistical analyses and then selectively report those that produce significant results
What is a spurious correlation?
Occurs when two factors appear casually related to one another but are not. Spurious correlations most commonly arise if one or several outliers are present for one of the two variables
Is the test fully representative of what it aims to measure, refers to which validity?
Content validity
Evaluates how accurately a test measures the outcome it was designed to measure, for now or in the future, refers to?
Criterion related validity
Which two types of criterion-related vaidity are there?
Concurrent validity (the ability of a test to predict an event in the present) and predictive validity (the ability of a test to measure some event or outcome in the future)
Does the test measure the concept that it is intended to measure, refers to which validity?
Construct validity