Week 3 Flashcards
What is reliability
The degree to which a test tool provides consistent results
what is Validity
the extent to which a test measures the construct it is intended to measure
can tests be valid without being reliable?
No but they can be reliable without being valid.
What is Classical test theory?
test/obtained scores are a combination of the true score of the test plus a level of error.
what is more accurate large score large error or small score small error
small score small error
Describe item selection as a source of error
sample of items chosen may not be equally reflective of every individual’s true score.
Describe test administration as a source of error
general environmental conditions at time of adminsitartion e.g. temperature, lighting, noise; temporary “states” of test taker e.g. fatigue, anxiety, distraction influence validity.
Describe test scoring as a source of error
Come about when performance on a test is subjective to the test administrator. Especially problematic when subjectively scored e.g. projective tests, essay tests. –> error is less when tests have set scales and scoring systems
Describe systematic measurment error as a source of error
if, unknown to test developer, test consistently taps into something other than attribute being tested.
Discuss Spearment to measure reliability
ranged from 0-1 scores closed to 1 a more reliable.
What is Domain Sampling thoery
true score could only be found if people repsond to ALL items which represent the contruct. this is lengthy and not always possible. So, the domain sampling problem considers the problem of using only a sample of items to represent a constuct.
What is Item reponse thoery approach to test development.
focus on individual items rather than test as whole.
What is internal consistency? what does high internal consistency?
the extent to which a psycholigcial test is homogenous/ heterogenous.
(measuring one construct)
HIC = all items should correlate
what is the stability over time issue for reliability?
The interpretation of individual score chnages when a test is administered on more than one occassion.
Describe test retest (stability)
determines relaibility - same test administered to the same group at two different time points. if the test is relaible there scores from each time point should be highly correlated.
when is test retest not appropriate to determine relaibility?
when the contructs is not stable, chnaging rapidly. emotion not good, IQ very good.
How do you Maximise Test retest reliability?
use stable contruct, no intervention and short time between testing
describe Parrallell/alternate for of reliability
what does high parrallel/alternate relaibility look like.
two forms of the same test developed with the same content and difficulty administered to the same group.
High reliability would be strong correllation between scores of the same test.
Describe the Split half method of Reliability what is an advantage of this method.
test divided into two halfs and compared. there should be strong correltions between each half. if scores on two test are same then scores on half of once test should be same eliminates need to screate scond test to test relaibility
does the split half over of underestimate relaibility? why?
why is it better than parrellel/alternate form relaibility testing?
underestimated because of smaller number of items used in correlation
it is better because it eliminates the need to create a second test to test relaibility
What is the Spearmen Brown formula used for?
used to test reliability when each half of the test for test re test of not the same length.
what is cronbachs aplha on which data is it used?
what is the range and redundency?
scores reliability for tests
used= on tests with graded score system (agree to disagree)
Range 0 (not similair) 1 (identical) .7 adequate .8 good .9 redundant.
when is Kuder Richardson 20 used
to determined reliability for tests with dichotomously scored items (0 or 1)
what is Content Validity?
does the test adeqaulty represent all the possible items which measure the contruct. If a unit spend half the time on math and half on phsycs the test should reflect this in the final exam.
what is Construct underrepresentation vs Construct irrelevant variance
underrepresentation = failure to caputrue important components of the construct
Irrelevent variance = measuring things other than the contsruct
what is criterion related Validity?
the extent the measure is related to the outcome. e.g. the low self esteem predicts depression. good school grades predict high perfomance in uni.
what makes a good criterion for testing criterion related validity
the criterion is reliable, and appropriate.
The criterion is not contiminated by the test - if the criterion and test have similair items the correlation will be airtificially inflated.
What is Concurrent Validity
criterion related validity
the extent that the measure in questions corresponds with an outcome assessed at the same time. how does a clincial interveiwing gaugiung anxiety level compare with and anxiety measure? can be used to see how valid subjective tests are by comparing outcome to written test –> outcome should be same
What is predicitve evidence?
criterion related validity
How well the test predicts performance on a criterion by comparing the measure in question with an outcome at a later time. (school predicts uni)
what is contruct validty
establishes how well a test measures a psychological construct
what is convergent evidence?
construct validity
refers to the degree that two constructs which should be theoretically related are related. (self esteem depression)
what is discriminant/divegent evidence?
constuct validity
demonstrates that the test is unique. low correlations should be observed with consturcts that are unrelated to what the test is trying to measure..
what is factor analysis
Construct validity
some items within a test moy be highly related and for a set. other may not be related and form a different set. multiple clusters or loadings indicate more then one construct