Lecture 2 - Dr Greg Yelland (DN) (incomplete) Flashcards
validity
- How well a test measures what it purports to measure
- important Implications regarding
- appropriateness of inferences made and
- actions taken on the basis of measurements
precision
- sensitivity & specificity
- always a compromise between sensitivity & specificity
- usually screening process using sensitive test
- then use highly specific test to determine which actually have dementia
- 3:00
accuracy
- test needs to be accurate
6: 30
reliability
- stability of measurement
- measurement is stable over time & within itself
7: 20
what are the three components of reliability?
1) inter-rater reliability - more to do with scoring than the nature of tests
2) test-retest reliability - should get the same score when doing the same test twice
3) internal consistency - within the test ppl should be scoring consistently
- items should items should be equally good at measuring what they are trying to measure
7: 50

What is test reliability?
- this is not scorer reliability
- test-retest - stability over time
- internal consistency
- homogenous - all items just testing one factor (anxiety)
- should be equally good at assessing that factor
- need to be aware of how many factors/behaviours a test is measuring
- if intend to measure one then should only measure one
10: 00
What is reliability?
- the proportion of total variance (σ2) made up of the true variance (σ2tr)
- variability in test scores: σ2 = σ2tr + σ2e
- reliability of a test score is always made up of
true score + error
X=T+E
- error is made up of random error & systematic error
Whenever we are talking about reliability & validity, we are talking about……..
correlation or correlation coefficients
- i.e., how well things are correlated on different aspects
e. g., with: - test-retest (looking at the correlation between first & second time test taken)
- internal consistency (looking at the correlation between different items on the test)
15:30
What are some sources of error variance?
- Test Construction
- Test Administration
- Environment
- Test-Taker Variables
- Examiner-Related Variables
- Test Scoring/Interpretation
each can contain both random & systematic error
16:20
What is the difference between systematic & random error variance?
- Systematic - constant, or proportionate source of errror in variables other than the target variable
- should not affect variance in scores
- Random - caused by unpredictable fluctuations & inconsistencies in variables other than the target variable
Systematic changes should not affect the scores; unpredictable changes will affect the correlation; the more robust the test to fluctuation, the greater the reliability.
How does error occur in test contruction?
the way you select or sample test items
- if all items consistently perform in the same way (the way you intended them)
systematic error - could come from an ambiguous question - some ppl may respond one way and others another
random error - may have one or two questions where someone does not have enough experience to give the standard response to the item
17:00
How can error occur during test administration?
Environmental Variables
Test-Taker Variables
Examiner-Related Variables
How do testtakers contribute to error?
Test-Taker Variables
- during test administration
- differences between ppl taking the tests
systematic - different ages & not taking ages into account
random - age, personality etc
issue:
- dont necessarily want to minimize by only testing 10 year olds coz then test is only relevant to 10 yr olds
solution:
so do 10 yrs, 11yrs, 12yrs etc, then create norms for different ages (age norms) - takes care of the variable by having different normative data for different ages
20:00
How does the test environment contribute to error?
- during test administration
- one may be tested in noisy another in a quiet environment
- testing in a group or individually
affects test scores
How can examiners contribute to error?
- during test administration
- examiner humanness - may be exhausted by last test - may skip bits to hurry it up
How can test scoring/interpretation contribute to error?
- subjectively scored tests have greater error (because rely on subjective judgements)
- moving toward computer based scoring to remove this source of error
- cannot have computer based if its the quality of the response (qualitative)
- much more error on qualitative than quantitative
22: 35
What should we aim for with regard to error & reliability
aim to remove systematic error and minimise random error so we get better reliability
24:35
What are some reliability estimates?
- test-retest
- parallel forms/alternate forms
24:50
What is a test-retest reliability estimate?
24:45
- same test taken twice - then see how well the scores are correlated
- issue of how long an interval between testing?
- the shorter the interval = the higher the test-retest reliability, because there are lots of things that can change in an individual over time
- systematic changes should not affect test-retest reliability e.g., hot room, cold room (everyone affected equally) 26:50
- random changes will affect correlation (test-retest reliability) (27:15)
- the more robust the test is to fluctuation = more reliability
e. g., a test that is not affected by time of day, or amount of sleep etc - robust enough to wash those effects out - therefore (28:30) - participant factors will affect test-retest reliability - experience, practice, fatigue, memory, motivation, morningness/eveningness
- as everyone differs in these areas = greater error variance
- practise effects - give you a clue about what is going to happen next time we do the same test - this may mean that we cannot use test-retest
24: 45
When would we use Parallel or Alternate forms of a test?
- when we cannot use tes-retest reliability
- due to e.g., practise effects giving testtaker a clue about what will be on the test next time
What is a parallel forms or alternate forms reliability estimate?
- parallel vs. alternate
- parallel forms - are better developed
- items have been selected so that the mean & variance has been shown to be equal
- alternate forms - similar but no guarantee that variance is the same (hence have introduced a source of error)
- testing is similar to process as test-retest - do one test then do the parallel or an alternate form.
- test sampling issues - problem: is test sampling issue (choice of items)
- best items are usually the best of the items available (unless create both tests at the same time
30: 50
What is one of the biggest problems faced when using a parallel form or alternate form of a test?
- test sampling issues - problem: is test sampling issue (choice of items)
- best items are usually used when creating the initial version of the test
(unless creating both tests at the same time)
- identifying source of error
- is it because it is not stable over time or is it because the different items (content) of the two tests are introducing error
- is it stable over time? (external)
- internal consistency across the two tests? (internal)
33: 50
Internal Consistency (Reliability)
- Split-Half testing
Split into two halves
Obtain correlation coefficient
What is the point of Split-Half testing?
To obtain internal consistency of full version - Spearman-Brown Formula
Estimates internal consistency of a test that is twice the length
When is the Spearman-Brown formula used?
- To obtain internal consistency of full version - of split-half tests
- Estimates internal consistency of a test that is twice the length
- not used when more than one factor (heterogeneity)
- not appropriate for speed tests
- must have homogeneity when using split-half method because could end up with an imbalanced distribution of the factors across the two halfs
Spearman-Brown Split-Half Coefficient
rSB = 2rhh / (1+rhh )
rSB = 2 x 0.9/ (1+0.9)
rSB = 1.8/ 1.9
rSB = 0.947
When would we use Cronbach’s Alpha?
- when we need an estimate to represent the sum of all of the individual variances in a split half test
- it estimates internal consistency for every possible split-half
- A generalised reliability coefficient for scoring systems that are graded by each item (sums all of them)
- used when items are graded (cannot not be used with dichotomous items)
- Essentially an estimate of ALL possible test-retest or split- half coefficients.
α can range between 0 and 1 (ideally closer to 1)
- cannot measure mutliple traits - must be homogeneous
When would we use Kuder-Richardson?
51:25
- when test is dichotomous
- tests every possible split-half correlations or test-retest
- mainly used in split-half
What is acceptable range of reliability?
53:35
Clinical – r > 0.85 acceptable
Research – r > ~0.7 acceptable
Reliabilities of Major Psychological Tests
INTERNAL CONSISTENCY
- WAIS – r = 0.87
- MMPI – r = 0.84
TEST-RETEST
- WAIS – r = 0.82
- MMPI – r = 0.74
summary of reliability
