2. Test worthiness and Statistics Flashcards
What is an important test in creation and measures of worthiness?
Correlations
what is the range for correlations?
-1 - 1
what are the ways to observe correlation?
Linear - line of best fit
Curvilinear - what is the shape of the line?
What does shared variance question?
What are the factors that contribute to shared variance?
what is another term for shared variance
Squared coefficient
what is the only way to determine a variance when you have a correlation coefficient?
square the variance then you know how much variance there are between two variables
what does reliability refer to?
Refers to how free the test is from measurement error; are you going to get the same, or a very close score if you sit the test again? It’s about internal consistency, reliability & dependability
what does reliability depend on?
construction of test & environment administered in
why will there always be error?
There’s no perfect test or environment so will always be some error but we want to minimise it.
how is reliability usually reported as?
correlation coefficient
why do different types of tests have different levels of reliability?
reliability e.g., well constructed achievement tests may have reliability coefficients of .90 but personality tests often much lower (.70) because the concept is abstract & potentially fluctuates
there are many ways to test reliability. What are some measures of reliability?
test-retest, alternative forms , consistency of measurement
what is test-retest reliability?
Give test twice to same group, usually a couple of weeks apart. then correlate results
what does a higher correlation suggest in test-retest reliability?
Higher the correlation, more reliable the test
why does the fluctuations in results of test-retest reliability occur?
Results can fluctuate depending on things such as the time between test being taken, forgetting information, might learn more about test contents by studying in the interim, more familiar with test format second time around
When is test-retest reliability likely to fluctuate?
when testing a stable construct (e.g. IQ)
what is ALTERNATIVE, PARALLEL OR EQUIVALENT FORMS RELIABILITY?
Making two or more versions of the same test
what issues ALTERNATIVE, PARALLEL OR EQUIVALENT FORMS RELIABILITY prevent?
`Stops issues like people remembering or studying particular answers between test-retest
what is the difficulty of ALTERNATIVE, PARALLEL OR EQUIVALENT FORMS RELIABILITY?
Hard to make tests equal in terms of content & levels of difficulty, or ensuring administration was exactly the same
what must test developers demonstrate in order to have good ALTERNATIVE, PARALLEL OR EQUIVALENT FORMS RELIABILITY?
Test developer must demonstrate the versions are truly parallel
what is reliability as internal consistency measuring?
Measures how test items relate to each other & the test as a whole
where does reliability as internal consistency look into?
Looks within the test to measure reliability
• E.g., test to measure of anxiety – respondents should answer items that tap aspects of anxiety in a similar way
what is them most common form of internal consistency measure?
Most common forms of internal consistency (i. e., reliability) are split-half and Cronbach’s alpha
What is split-half reliability?
Use one form of the test administered at the same time. Split the test in two and correlate the scores
what are the issues of the split-half reliability?
test may get harder as you go along so first half is not equal to second. May compare odd and even numbered questions but still the halves may not be equal & the test is shorter which can decrease reliability
what equation can be used to compensate for shorter tests in split-half reliability?
Can use Spearman-Brown equation to compensate for shorter test (rx2/1+r)
what does CRONBACH’S COEFFICIENT ALPHA AND KUNDER-RICHARDSON attempt to rate?
Try to rate internal consistency by estimating reliability of all possible split-half combinations by correlating each item with the total and averaging
what is Kuder-Richardson used with?
Kuder-Richardson used with forced-choice format tests
what is validity?
The extent to which all the available evidence supports that the test is actually measuring what it is intended to measure.
what is validity essential for?
It is a central requirement for test without which the test items/tasks would not have meaning
what are the categories of content validity?
face validity
what are the categories of construct validity?
- Criterion-related validity
* Predictive validity
what is content validity testing?
The content of a test reflects what the test is aiming to measure. It is sometimes enough for a validity test
What is construct validity not enough to ascertain test validity for?
Not enough to ascertain test validity beyond achievement type tests e.g., for more abstract constructs