Lecture 2 Validity & Reliability (Catherine) Flashcards
To provide an overview of the content of Lecture 2
Distinguish between precision, accuracy, reliability & validity in relation to measuring instruments
An instrument has:
- Precision if it has fineness of discrimination
- Accuracy if it gives the correct value and has no systematic bias
- Reliability if the instrument has measurement stability, with no random substantial fluctuations
- Validity if it measures what it proports to measure
Name the key types of Reliability
- Test-retest Reliability: Correlating pairs of scores on 2 different administrations of the same test
- Internal Consistency Reliability: split-half testing Cronbach (nondichotomous items) Kuder & Richardson (dichotomous items)
- Inter-scorer Reliability: The degree of agreement between scorers
What are the key challenges of test reliability?
- Stability over time?
- Internal consistency
- Test scores are made up of the true score plus error
- There is always variability in test scores as a result of error
What prevents scores being stable over time?
-Stability over time: central problem is the interpretation of individual scores changes when the test is administered multiple times
What does internal consistency mean?
The extent to which a psychological test is homogenous (tests one item) or heterogeneous (tests more than one item)
- DASS tests depression, anxiety, & stress is therefore heterogeneous
What are the types of error that are included in the final test score (Test scores are made up of the true score plus error)?
-Test Construction Item or content sampling
-Test Administration (environment, test-taker & examiner related variables)
-Test Scoring & Interpretation
(Hand scoring or subjective judgements)
What is the main problem with Test Construction?
-Systematic error: an ambiguously asked question could be interpreted differently by 2 people
-Errors in Item or content sampling
-Random error: e.g. Catherine is morning person, Donna is afternoon person if we both have our exam in the morning, Catherine has an advantage
NB: Can use Alternate Forms to identify source of error and internal consistency for fatigue as source of error variance
What are the main problems with Test Administration?
-Inconsistent environmental factors (e.g. air con vs no air con)
-Test-taker (individual differences not taken into account like age)
-Examiner related error (fatigue, boredom, etc)
NB: Can use Test-Retest to identify source of error
What are the main problems with Test Scoring & Interpretation?
- Hand scoring open to error
- Subjective judgements
- Computer aided scoring cannot be used for qualitative data
List the different forms of Reliability Estimates
- Test-Retest Reliability
- Parallel Forms Reliability
- Alternate Forms Reliability
- Internal Consistency Reliability using
- Split-Half Reliability
- Chronbach’s alpha
- Kuder-Richardson
- Inter-Scorer Reliability
Which source of Error Variance does Test-Retest Reliability attempt to account for?
Test-Retest Reliability Testing attempts to account for Errors in Test Administration
What are the important considerations to successfully apply Test-Retest Reliability Testing?
- The Test is taken twice and the results are correlated
- It is important to have an appropriate amount of time between tests (this will vary depending on the type of test - e.g. MSE needs 18 months,)
- Systematic Changes should not affect the scores (e.g. everyone in a cold room)
- Unpredictable changes will affect the correlation (such as?)
- A reliable test will be able to sustain greater levels of fluctuation (what does that mean?)
What are the factors that affect Test-Retest Reliability?
individual differences, experience, practice effects, memory, fatigue, motivation.
Which source of Error Variance does Parallel Forms or Alternative Forms Reliability Tests attempt to account for?
Parallel Forms or Alternative Forms Reliability Tests attempt to account for errors in Test Construction
When would a test administrator implement a Parallel Forms or Alternative Forms Reliability Test?
In a situation where it is not possible to conduct a Test-Retest Reliability test
In What ways is a Parallel Forms or Alternative Forms Reliability Test similar to a test-retest reliability test?
- In both cases the participant completes two tests
- The aim of both is to minimise error variance
What are Parallel Forms Reliability Tests?
Parallel forms of a test exist when for each form of the test the means and variances of observed test scores are equal
What are Alternate Forms Reliability Tests?
Alternate forms are simply different forms of a test that have been constructed to be parallel. They are designed to be equivalent with regard to content and level of difficulty, but do not meet the same stringent criteria as parallel forms (so means & variances have not been made equivalent)
What is the main draw back with Alternate Form Tests of Reliability?
Because the means and variances have not been made equivalent (as they have in parallel form) it leads to highly ambiguous test confounds: as now have two sources of error: Time and Content, whereas with parallel forms just time as a confound.
What methods can be employed to achieve internal consistency reliability?
Split-half reliability testing can be employed to achieve internal consistency reliability
What are the main considerations when implementing Split-half reliability testing?
-Ensure the split is in a meaningful way i.e. not first-last half of test (fatigue effects)
better to do odd-even split
-if its heterogeneous test ensure this is also split in meaningful way
What statistical analysis does a test administrator employ to assess Split-Half reliability of a homogeneous test?
A test administrator can obtain a correlation coefficient of a homogeneous test using a Spearman-Brown Formula
-The Spearman-Brown in effect converts the split-half test into a full test.
It cannot be used for a heterogeneous test!
What statistical analysis does a test administrator employ to assess Split-Half reliability of a heterogeneous test?
A test administrator can test the internal consistency reliability of any heterogeneous split-half test using a Chronbach’s Alpha Formula
-The Chronbach’s Alpha is a generalised reliability coefficient for scoring systems that are graded by each item.
(With the DASS we would need a Chronbach’s alpha for each trait measured)
It cannot be used for either a homogeneous or a heterogeneous test But NOT dichotomous answers (yes/no; true/false)
What statistical analysis does a test administrator employ to assess Split-Half reliability of a dichotomous test?
A test administrator can test the internal consistency reliability of any dichotomous split-half test using a Kuder-Richardson Formula
The Kuder-Richardson essentially provides an estimate of all test-retest or split-half coefficients for yes/no or true/false answers
What are acceptable Reliabilities for Clinical and Research situations?
Acceptable reliabilities for Clinical settings is:
r > 0.85 acceptable
Acceptable reliabilities for Research settings is: r > ~0.7 acceptable
What are the Internal Consistency and Test-Retest Reliabilities of the WAIS and MMPI?
Internal Consistency of WAIS: r = 0.887
Internal Consistency of MMPI: r = 0.84
Test-Retest Reliabilities of the WAIS: r = 0.82
Test-Retest Reliabilities of the MMPI: r = 0.74
NB: WAIS test-retest is just outside acceptable limits
the MMPI is suseptable to change over time as its a personality inventory used on clinical patients whoa re more likely to change over time
What type of reliability would a test administrator be assessing if they utilised a Kuder-Richardson, Chronbach’s Alpha or Spearman-Brown?
The Test Administrator would be assessing a tests Internal Consistency
A Correlation coefficient can be used to check all other types of reliability except Internal Consistency. What are these types of reliability?
- Test-Retest Reliability
- Alternate Form Reliability
- Inter-scorer Relaibility
What factors does a Test Administrator need to bear in mind when measuring reliability?
- Is the test measuring state or trait? (trait is more enduring)
- The range of possible responses (ideally 5-7 responses - 0 - 10 is not ideal as people tend to cluster around the middle)
- Speeded tests - towards end of test test taker may not have had time to attempt a number of items, does not mean they would have been incorrect, did not get time to answer
There are seven methods utilised to improve reliability, what are they?
- Quality of items (need to be clear, concise, homogeneous)
- Ensure consistent testing conditions
- Reduce Test-Retest time intervals
- Longer assessments
- Develop a robust scoring plan
- Test items for reliability & adapt the measure
- Ensure Validity
There are 3 classes of Validity, what are they?
- Internal Validity
- External Validity
- Test Validity
What is Internal Validity interested in?
Relevant to Experimental Validity
Confidence in making causal statements about study outcomes
What is External Validity interested in?
Relevant to Experimental Validity
Confidence you can generalise results to people outside of the study
What is Test Validity interested in?
Relevant to this Unit!!!!
Confidence that what you are measuring truly represents what you think you are measuring
What are the 3 forms of assessing test validity?
- Content Validity
- Criterion-Related Validity
- Construct Validity
There are 3 traditional measures of test validity, name them
- Content Validity
- Criterion-Related Validity
- Construct Validity
There are 3 methods to assess each of these forms of validity, what are they?
- Scrutinise test contents
- Comparing Scores on this test to other tests
- Perform an analysis of how scores on this test relate to scores on other tests and theories