Intro to Psych Assessment Flashcards
What does the MMPI Measure
Mental Health Needs - eg. psychological disorders
What is the Mean and SD of the MMPI
M = 50, SD = 10 (scores as a t-test)
What is the most popularly used IQ test?
Stanford Binet V
What is the Mean and SD of the Stanford Binet Inventory?
M = 100, SD = 15
What does the NEO-PIR measure
Personality variables (big 5)
How often are many psychological tests updated
Approx. every 10 years
Why are tests updated regularly
Because cultural and social values change and the tests need to be updated to reflect these
What is meant by Reliability?
Consistency of measurement of a test
What is Psychological Research?
Psych Research aims to make generalisations about a population from a sample of people.
What is Psychological Assessment?
A special kind of Psych research which seeks to make generalisations about specific individuals with a sample n=1
What is test-retest reliability?
When scores from the same test administered at different times are highly correlated at diff. time points (ie. r = close to 1)
What is the importance of a CI as it relates to test scores?
Because it provides a range of values between which you’d expect the test score to fall.
Why is consistency (reliability) important?
Because we want to know that our assessment of needs at one point is similar to our assessment of your needs at a future time . The test should not just refer to your needs in the moment, but generalise to your needs across time.
What is Psychological Testing?
The process of administering one or more psychological test.
What is validity?
Assesses the usefulness of inferences made from test scores.
How are testing and theory interconnected?
They are two aspects of the same thing, with good theory we can develop good tests and vice versa.
Psychometric theory is cumulative
There has been no serious rebuttal to psychometric theory.
Why is variance in psychological testing a good thing?
Because a test with good reliability that produces a wider spread of scores allows us to identify individuals more easily than a test with lower variability.
When is variance in psychological testing a bad thing?
When it is caused by error/noise (measurement error) rather than by actual individual differences.
What two types of variance exist in psychological testing?
- Individual variability (good)
2. Error/random variability aka measurement error (bad)
What is a good way of measuring Reliability.
The extent to which a test correlates with itself. High correlation = good reliability
Test-Retest reliability.
What are the two components of an observed score?
X (observed score) = T (true score) + E (error)
What are the assumptions of general model of reliability?
- Mean error of measurement = 0 (because it is random)
- True scores and errors are uncorrelated r the = 0
- Errors on different measurements are uncorrelated r e1e2 = 0
What are the formulas for test scores from classical test theory?
X=T+E
and
σ^2X = σ^2T + σ^2E (variance version)
What is the formula for the reliability coefficient?
r tt = σ^2T / σ^2X (= true score variance/observed score variance)
What statistical technique is commonly used to determine reliability?
Correlation
If r tt = 0.9, what percentage of variance in observed scores is attributed to the true scores?
= 90%
Note that this gives the same value as R squared would in a traditional correlation/regression. But test/retest reliability interprets this differently.
What values can r tt range between?
0 and 1 (0 = no reliability, 1 = perfect test/restest reliability)
How is measurement error determined using r tt?
1 - value of r tt = proportion of variance caused by measurement error.
What is the main aim of test developers?
Reducing measurement error!
What is the most common/simple way of measuring reliability?
The test-retest reliability
What is the reliability of gender questionnaires?
r tt = 0.95
Average reliability for individual items in a test
r tt = 0.25 (so having many items in a test important as a means of reducing measurement error).
What is used to quantify the CI around a persons test score?
The standard error of measurement.
Which describes the distribution of scores for an individual if they were to be tested an infinite number of times.
What is the formula for standard error of measurement?
σmeas = σx√(1-rtt)
What is the SEM (standard error of measurement) when r tt = 0
SEM = σx√(1-0) = σx
As reliability decreases, variability ________?
As reliability decreases, variability increases
What is the predicted true score?
A score that the CI is based around. It is corrected for the unreliability in the observed true score (so always closer to population mean) .
What happens to predicted true score when observed score is below the mean?
The predicted true score goes up (towards the population mean)
What happens to the predicted true score when observed score is above the mean?
The predicted true score goes down (towards the mean).
How does the reliability coefficient (r tt) relate to the predicted true score?
The predicted true score will move towards the mean by the value of the error term e.g. if rtt is 0.9, the PTS will move to/away from the mean by 0.1; if the rtt is 0.6, the PTS will move to/away from the mean by 0.4
How long is the interval between retesting for test-retest reliability?
No more than 6 months. Often over a few weeks or days.
What are the limitations of test-retest reliability testing?
Because re-testing may be influenced by practice effects, and tests that require problem solving will advantage performance on the re-test.
What kinds of tests are best suited to test-retest reliability?
Sensory Discrimination and Motor Tests that are not effected by repetition.
What is test-retest reliability?
When same test is given at two different time points and the correlation of the scores determined to see how reliable it is.
What is the reliability coefficient?
r tt (the correlation between test scores at two different time points)
What is Alternate-Form Reliability?
When the same person is tested at two time points using two different, but equivalent tests. Reliability coefficient is obtained by correlating score on first and second tests.
Why is Alternate-form testing often better than test-retest reliability?
Because it avoids practice effects and cheating.
What are the limitations of Alternate Form Testing?
- Can still be influenced by practice effects if they are large (reduces but doesn’t eliminate their influence).
- Alternate forms are not available for all tests (or can be difficult to produce).
What effect will practice effects have on reliability coefficient?
r tt will reduce as practice effect increases (but not effected by small practice effects)
How does the length of the test effect its’ reliability?
The longer a test (more items in it), the more reliable it is.
What does the Spearman-Brown formula estimate?
The effect of lengthening or shortening a test on its reliability (the reliability coefficient)
What is split-half reliability?
When the scores from a test are split in half and the two ‘halves’ correlated with each other.
Gets around the problem of not having an alternate-form test, by using two halves of one test in this way.
How can tests be divided for split-half reliability?
- Odd or Even Numbers
- First half and second half, but this is not recommended as can be infl. by fatigue effects and inconsistencies with difficulty of questions.
What is the estimated reliability coefficient?
r nn = number of items in new test/ number of items in old test.
Product of the Spearman-Brown Prophecy Formula.
What happens to the product of the Spearman-Brown Prophecy formula as r tt increases?
It produces a negatively accelerating growth curve or a “diminishing returns” function.
What is internal/interim consistency?
The average correlation between each item and all the other items in the test.
What are two ways of measuring internal consistency?
- KR-20 (Kuder-Richardson): dichotomous items ie. yes/no
2. Cronbach’s Alpha (Coefficient Alpha): for items with interval level scores i.e. high/medium/low
What is a speed test?
One in which individual differences depend on speed of performance. Items are uniform and easy but not enough time provided for anyone to answer all questions.
What is a power test?
One in which individual differences depend on ability to solve difficult problems. Enough time to complete test given but some items too difficult for anyone to solve. Items start easy and finish hard.
What are the 5 methods for testing reliabilty?
- Test-Retest Reliabilty
- Alternate-Form Reliabilty
- Split-Half Reliabilty
- Kuder-Richardson and Coefficient Alpha
- Scorer (Inter-Rater) reliability
What is the Standard Error Measurement?
The standard deviation for a distribution of individual test scores.
What is used to establish a confidence interval around expected scores on a test?
The Standard Error of Measurement (SEM) and the Predicted True Score (PTS).
CIs are centred around the PTS, not the observed scores and are therefore asymmetrical to the observed score.
How many applications of a test are required when assessing internal consistency (KR20 or Coefficient Alpha)?
One
What do KR20 and Coefficient Alpha Mesure?
Internal Consistency: the correlation between one test item and all of the other items
In a test with 20 items, how many correlations is KR20 or Coefficient Alpha made up of?
20
What is confirmatory factor analysis?
Factor analysis that allows a theoretical model to be specified and tested (hypothesis testing)
What is exploratory factor analysis?
An ‘old’ type of factor analysis that did not allow for null hypothesis testing.
What is face validity?
What a test appears to measure to the person taking the test
What is content validity?
how adequately a test samples behaviour representative of the universe of behaviour that the test was designed to sample (eg. exam that covers all of course material)
What is criterion related validity?
how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest (the criterion)
What are two forms of criterion-related validity?
- Concurrent Validity
2. Predictive Validity
WHat is concurrent validity?
degree to which a test score is related to some criterion measure obtained at the same time (eg. correlation between test scores and expert rater on someone’s psychological wellbeing)
What is predictive validity?
an index of the degree to which a test score predicts some criterion measure (eg. do SAT scores predict performance at University?)
WHat is a criterion?
the standard agains which a test or test score is evaluated
Characteristics of a criterion
- Relevant
- Valid
- Uncontaminated
What is the validity coefficient
correlation between test scores and scores on a criterion measure
What value should the validity coefficient have?
It should not be defined, but should be high enough to result in the identification and
What do KR-20 and Cronbach’s Alpha set?
An upper limit for the reliability of a test.
How do the internal consistency coefficient relate to the test-retest reliability coefficient?
The internal consistency coefficient can be much higher than the test-retest reliability coefficient.
Is the internal consistency coefficient the same as the test-retest reliability coefficient?
No, and they can be different, so just because internal consis. is good, doesn’t mean rtf will be good.
What is test standardisation
When a test is administered and controlled in the same way each time it is administered.
What are the disadvantages of using computerised tests?
Many test takers prefer the ‘human’ element in the assessment process.
What is convergent validity?
Different tests measuring same constructs should correlate highly.
What is divergent/discriminant validity?
When tests measuring different constructs do NOT correlate
What can cause low correlation between two tests?
Unreliability in test scores can attenuate (lower/press down) the correlation between the two tests.
What is disattenuation?
A formula used to remove the influence of the unreliability of test scores on the correlation between two tests.
What is the disattenuation formula?
r xy / square root of (rxx x ryy)
What is the effect of test unreliability on correlation between two tests?
The more unreliability, the more the correlation is likely to be attenuated by the unreliability (ie. the correlation will be lower than it should be because of the influence of the unreliability)
How can you tell if there is good convergent validity between tests?
The disattenuated correlation between the two is high (rxy = close to 1)
What is ‘n’ used in the spearman -brown prophecy formula?
n = the number of times the test is being lengthened by
n = number of items in ‘new’ version of test/ number of items in ‘old’ test’
eg. when increasing number of test items from 25 to 100 n = 100/25 = 4
How many items are usually in a good IQ test?
500 - 1000
How long would it take to administer a good IQ test
Approx 1 hour
What is the minimum reliability of a test for individual assessment?
0.9
What is the minimum reliability for a test of group research?
0.7
Can you use tests with reliability lower than 0.9?
Yes, so long a you know how to interpret the SEM and use this to create CIs (though the CI will be less precise)
How does the reliability of a test affect its CI
The more reliable, the more precise the CI
What is the CHC Model
The Horn-Cattel Model of assessing cognitive abilities.
How many cognitive abilities are listed at level 2 of the CHC model
10
80 - 90% of variance in cognitive abilities can be assessed using how many of the CHC categories?
5 (out of 10)
What are the 5 main categories of cognitive abilities in the CHC?
- Fluid intelligence (GF)
- Crystallised Intelligence (GC)
- Short Term Memory (GSm)
- Long Term Retrieval (Glr)
- Processing Speed (Gs)
What are the 5 lesser categories in the CHC Model?
- Quantitative knowledge
- Visual processing
- Auditory processing
- Correct Decision Speed
- Reading/Writing
What is fluid intelligence (Gf)?
Problem solving abilities (executive function), ability to derive creative solutions from available information.
What is Crystallised intelligence (Gc)?
Acquired knowledge that grows over time, such as educational/cultural knowledge.
What is short-term memory (Gsm)?
Working memory, ability to hold information in mind to guide our behaviour over short intervals.
What is long term retreival (Glr)?
Ability to learn from experience and use info from long term storage. Based on evolutionary old parts of brain such as hippocampus and limbic system
What is processing speed (Gs)?
The speed at which we can solve problems.
How many of the CHC categories are typically used in a Weschler test?
4 (does not include long term retrieval - this is an ad on)
What is the cut score on a BDI_II test?
A score of 20
What effect does the prevalence (base rate) have on sensitivity and specificity?
In cases of extreme frequency (v. high or low base rates), the sensitivity and specificity of the test can be inaccurate.
What is positive predictive power?
The probability that someone with a positive test result actually has the condition.
= a/ a + b (row ratio)
Ratio of true positives/all people with a positive test result.
WHat is negative predictive power?
The probability that someone with a negative test result actually doesn’t have the condition.
= d/ d+c (row ratio)
Ratio of true negatives/ all people with a negative test result
What is sensitivity?
The proportion of people who have a condition that are correctly identified by the test (column ratio = a/ a +c)
Ratio of true positives/all people with the condition
What is specificity?
The proportion of people without the target condition for whom the test is also negative (column ratio = d/ d +b)
Ratio of true negatives/all people without the condition.
At what level should PPP be before a decision not to test is made?
PPP must be greater than 0.5 (50%), if lower, shouldn’t be using a test to diagnose.
When prevalence is low, what happens to NPP and PPP?
The PPP becomes very low, while the NPP stays ok (so most people who are diagnosed with condition, don’t have it)
What are common ways of misinterpreting NPP and PPP?
They are not fixed values, but depend on the prevalence in the test setting. Especially problematic when prevalence in a study setting is around 50%, while it would normally be much lower in a professional/population settings (eg. the child abuser example from Cohen & Swerdlick - as base rate decreased, false positives increased and false negatives decreased)
For two tests both with same sensitivity and specificity, what happens as base rate decreases?
The number of false positives increases.
PPP of the test decreases (so likelihood of having the condition if you’ve been diagnosed is not high)
What is the standard error of measurement equal to when rtt = 0?
When rtt = 0, the SEM is equal to the standard deviation of the test (ie. 15 if it was an IQ test)