Ch 8 Psychometrics, Test Design and Stats Flashcards
Standard error of measurement
SD of random errors around the true score
Any test score of an individual consists of what?
a true score
a random error
Random errors around true score have a what?
normal distribution and mean of 0 over infinite trials
Are all errors in tests random?
No. Some are systematic
Descriptive stats describe what
quantitatively main features of data collected
What are measures of central tendency
mean, median, mode
interquartile range
What are measures of variability
SD and variance
Tighter distribution of variability means?
High reliability
Kurtosis
see where there is a peaked or flat distribution of data
Leptokurtic distribution means…
Peaked distribution of data
Platykurtic distribution means…
Flat distribution of data
Skew
tendency of scores to cluster to higher or lower end of distribution
Cluster at Higher end of distribution means positive or negative skew?
negative
cluster at Lower end of distribution means positive or negative skew?
positive
Examples of generalized linear model (GLM)
logistic regression
maximum likelihood
Examples of general linear model are
ANOVA
ANCOVA
linear regression
DV follows what distribution in general linear model
normal distribution
DV follows what distribution in GLM (generalized linear model)?
error distribution other than normal
Item Response Theory
latent response theory
refers to models that explain the relationship between latent traits (unobservable attribute) and manifestations (i.e. observed outcomes)
focuses on item level characteristics rather than on test level characteristics
Simplest IRT model - Rasch model
built on assumption that the most parsimonious and effective predictor of a trait is the relationship between the difficulty of an item and the ability of a person
used to measure latent traits like attitude or ability; It shows the probability of an individual getting a correct response on a test item.
uses item characteristic curve (ICC)
Bayesian model has which 3 elements
prior probability distribution
likelihood function
available new data
uses probability to represent all uncertainty within the model
The 3 elements according to Bayesian model can produce what?
posterior probability
What makes Bayesian models unique
incorporate prior info into a statistical model
Normal Distribution is also known as
bell curve
normal distribution
the classic way scores are expected to fall
Central tendency include
mean median mode
What will alter the rank of mean, median, mode
Skewed distributions
if skewed to the left (positively skewed), mode< mean < median
if skewed to the right (negatively skewed), mean < median < mode
What is SD?
square root of variance
spread/dispersion of a dataset relative to its mean
What is variance?
average of squared differences of each observation in a distribution from the mean
First moment of distribution?
Mean
Second moment of distribution?
Variance
Third moment of distribution?
Skewness
Fourth moment of distribution?
Kurtosis
When are transformations used?
to change overall shape of underlying data to address issue of non normal distribution
Mean of Standard Score
100
SD of standard score
15
Mean of T score
50
SD of T score
10
Mean of Scaled Score
10
SD of scaled score
3
Mean of Z score
0
SD of Z score 1
1
Mean of Stanine
5
SD of stanine
2
Mean of percentile
50
1SD in percentile?
34.13%
2SD in percentile?
+13.59%
3SD in percentile
+2.14%
4SD in percentile
+0.13%
Reliability
consistency of results
tells to what degree that individual differences in test scores can be attributed to true differences
Reliability can be expressed in terms of
reliability coefficient 0-1
Reliability can be expressed as the ratio of…
true variance to total variance
Relationship between reliable measures and sensitivity to change
perfectly reliable measures cannot detect change
TRADE OFF between the 2
Name 5 types of reliability
Test-retest reliability Alternate forms reliability Split-half reliability Inter-item reliability Interrater reliability
Test-retest reliability looks at
stability of scores on repeated administrations
can be affected by test-retest interval and practice effects
Error variance means
random fluctuation in performance from one administration to another
alternate forms reliability
stability of test over time
consistency of response to diff sample of items tapping the same knowledge
split half reliability
internal consistency
split test in diff ways using a single admin
correlation between half of the test scores and the other half
inter item reliability
Kuder richardson formula
consistency between multiple items measuring the same construct
interrater reliability
scoring of same test material by different scorer
4 validity types
content validity
predictive validity
concurrent validity
construct validity
content validity
The extent to which a measure is a representative sample of the subject matter or behavior under investigation.
construct validity
The extent to which a measure accurately assesses the construct or latent attribute that it is intended to measure.
concurrent validity
The extent to which the results of a test or measurement correspond to those of a previously established and related measure, collected at the same point in time
predictive validity
The degree to which a test score predicts future behavior or performance on an accepted criterion measure
structured equation modeling SEM
Any of a range of multivariate statistical analysis methods which examine the structural relationship between measured and latent variables.
test sensitivity
The ability of a test to correctly classify an individual as having a disease or condition.
test specificity
The ability of a test to correctly determine the absence of a disease or condition.
item response theory
A statistical theory and a set of related methods which model the relationship between test item performance, test taker ability, and test item characteristics.
principal component analysis
A statistical method for reducing the dimensionality of a data set of interrelated variables into its underlying dimensions, or principal components, using orthogonal rotation.
exploratory factor analysis
A form of factor analysis used to explore the possible underlying factor structure and latent constructs of a set of observed variables, without a predetermined model.
Confirmatory Factor Analysis
A form of factor analysis which is used to verify the fit of a hypothesized factor structure of observed variables and their underlying latent constructs.
Threats to validity
history testing interval order of test admin regression to the mean multiple comparisons situational variables
Threat to internal validity - History
education, reading, age, handedness, gender, race
Threat to internal validity - testing interval
artifact from duration of testing or practice effects
Threat to internal validity - order of test admin
fatigue
exposure to test before
Threat to external validity - regression to the mean
- samples far from the mean on the first set of scores will be closer to the mean on the second set
- random variance affecting the samples in the second measurement is independent of the random variance affecting the first
Threat to external validity - multiple comparisons
comparing time 1 and time 2; differences may be due to chance and not a clinically relevant finding
Threat to external validity - situation variable
medication, mood, effort, sleep etc
Construct validity evaluated via which 2 techniques?
convergent
discriminant
convergent validity
when 2 or more approaches to measurement of some trait are positive correlated
divergent validity
low correlation b/w 2 similar approaches to measurements of different traits
multitrait multimethod matrix
composition of correlation coefficients of 2 or more traits and 2 or more methods
contains 4 types of correlation
purpose is to measure construct validity
positive predictive power
proportion of the time we are right when we state that a condition is present based on a test result
negative predictive power
proportion of time we are right in stating on the basis of our test that someone does NOT have a condition
likelihood ratio
likelihood that a given test result would be expected in a patient with the target disorder compared to the likelihood that that same result would be expected in a patient without the target disorder.
used for interpreting diagnostic tests
LR tells you 1) how likely a patient has a disease or condition, 2) the utility of a diagnostic test
The higher the ratio, the more likely they have the disease or condition.
likelihood ratio of 1
test result is just as likely in those with and without condition - useless test result
positive likelihood ratio values > 1
positive test result is indicative of the presence of the condition
what is a desired negative likelihood ratio?
between 0 and 1
as the likelihood ratiom moves further away from 1, it means that…
the test provides more useful information in detection of a specific condition
what does an optimal likelihood ratio depend on?
the test or measurement characteristics
benefit of using likelihood ratios over predictive values?
prevalence of the condition does not affect the statistic
pre-test probability
the estimated probability that a patient has a condition prior to knowing a test result
- base rate of a condition
post-test probability
probability that the patient has the condition given a positive test result
how well the test rules in the condition
incremental validity
the extent to which the use of the test improves post-test probability with respect to the pre-test probability
Goal for performance validity test
maximize specificity
Goal for identifying persons showing any degree of impairment in a domain of function
maximize sensitivity
Receiver operating characteristic (ROC)
visualize the performance of a test by creating a plot of sensitivity and 1 - specificity
The ability of a test to discriminate diseased cases from normal cases
Area under ROC curve
AUC - measure that reflects overall accuracy of test’s predictions and can compare detection accuracy of assessment tool
SEM
measure of error variance around a single true score
how to calculate SEM
SD of the error distribution around true score
takes into account the SD of the test and test’s reliability coefficient
What happens to SEM and SEE when reliability increases?
SEM and SEE decrease
What happens to CI if reliability is very poor
it is very large
2 things to determine whether norms are psychometrically sound
is it normally distributed in the population
is the standardization sample representative of the population that I am comparing my pt’s performance to?
low score interpretation for a specific domain is based on the assumption of
central limit theorem
reliable change index
minimum magnitude of change required for psychometric certainty that 2 scores actually differ
diff b/w efficacy and effectiveness
parallels the difference between statistical significance and clinically meaningful difference
efficacy
stat difference - whether an intervention produces expected result under ideal circumstances
effectiveness
clinical meaningfulness - benefit of an intervention under real world conditions
calculation of reliable change index
uses SE of difference and computes z score for the difference between the individual’s tests based on normal probability distribution
RCI needs to fall within what range to reflect significant difference?
+/- 1.96
discriminant analysis
the process of using a score profile to decide whether a patient belongs to one group or another
descriptive discriminant analysis
to describe differences between 2 or more groups on a set of measure
predictive discriminant analysis
to classify subjects into groups on the basis of a set of measures
predictive discriminant analysis
to classify subjects into groups on the basis of a set of measures
How is regression used in interpreting test scores?
estimate premorbid levels
assess change in functioning (predict retest score)
logistic regression in test interpretation
allows one to determine the probability that a score belongs to one group or another
generate formulas using multiple variables from one test to differentiate between groups
prevalence is also known as
base rate
what is base rate
total no of cases of a particular phenomenon that develop within a given period
signal detection theory
use in characterizing response styles in recognition memory testing
standard error of measurement
measure of variability of scores obtained on a test relative to true score
reliable test has small standard error of measurement
standard error of the mean (SEM)
whether sample mean varies from sample to sample around the true mean of the population
SD of the sampling distribution sample mean
larger sample -> smaller SE of the mean
SD divided by sq rt of sample size
difference between SD and SE
SD is used to figure out how “spread out” a data set is Standard error (SE) or Standard Error of the Mean (SEM) is used to estimate a population's mean.
Difference between generalized linear model and general linear model
The general linear model requires that the response variable follows the normal distribution whilst the generalized linear model is an extension of the general linear model that allows the specification of models whose response variable follows different distributions
difference between SD and variance
Variance is a measure of how data points vary from the mean
SD is the measure of the distribution of statistical data.
Sn-Nout
Negative test score on a high Sensitivity test rules OUT a diagnosis
Sp-Pin
Positive test score on a high Specificity test rules IN a diagnosis
PPV tells us
the likelihood that someone has the condition when the test score is positive for the condition
NPV tells us
the likelihood that someone does NOT have the condition when the test score is negative for the condition
Prevalence rates affect
PPV and NPV
Prevalence rates do not affect
sensitivity and specificity of a measure
When prevalence rate of a disease decreases
PPV also decreases
post test probability of a positive test is equal to
PPV
Pre test probability is the likelihood of…
having the disease before performing the test
Parametric statistical modeling
define SOC based on central limit theorem
estimate performance of an individual relative to a group
norm referenced
Bayesian statistical modeling
define SOC to develop individual comparison standard
Pathognomonic sign
a sign whose presence means that a particular disease condition, or impairment is present beyond any doubt
(e.g. apraxia, aphasia, agnosia, hemiparesis, spatial neglect)