Lecture Two Flashcards
1
Q
Remember correlations?
A
- An important stat in test creation and measures of worthiness
- -1 to 1
- linear
- line of best fit
- curvilinear
- assumptions underpinning correlation – why?
- How does variance impact on the coefficient?
- What about the shape of the line?
2
Q
Shared variance (squared coefficient)
A
- What are the factors that contribute to the shared variance?
- What is not shared?
3
Q
Reliability
A
- Refers to how free the test is from measurement error; are you going to get the same, or a very close score if you sit the test again? Its about consistency and dependability
- Depends on construction of test and environment administered in
- There’s no perfect test or environment so will always be some error but we want to minimise it
4
Q
Reliability
A
- Reliability usually reported as a correlation coefficient
- Different types of tests tend to have different levels of reliability e.g. well-constructed achievement tests may have reliability coefficients of .90 but personality tests often much lower (.70)d= because the concept is abstract and potentially fluctuates
- Many ways of measuring reliability including test retest, alternative forms, consistency of measurement’
5
Q
Test-retest reliability
A
- Give test twice to same group, usually a few weeks apart
- Correlate results
- Higher the correlation, more reliable the test
- Results can fluctuate depending on things such as the time between test being taken, forgetting information, might learn more about test contents by studying in the interim, more familiar with test format second time around
- Is less likely to fluctuate if testing a stable construct (e.g., IQ)
6
Q
Alternative, parallel or equivalent forms reliability
A
- Making two or more versions of the same test
- Stops issues like people remembering or studying particular answers between test-retest
- Hard to make tests equal in terms of content and levels of difficulty, or ensuring administration was exactly the same
- Test developer must demonstrate the versions are truly parallel
7
Q
Reliability as Internal consistency
A
- Measures how test items relate to each other and the test as a whole
- Looks within the test to measure reliability
- E.g., test to measure of anxiety – respondents should answer items that tap aspects of anxiety in a similar way
- Most common forms of internal consistency (i.e. reliability) are split half and Cronbach’s alpha
8
Q
Split-half reliability
A
- Use one form of the test administered at the same time. Split the test in two and correlate the scores
- Issues: test may get harder as you go along so first half is not equal to second
- May compare odd and even numbered questions but still the halves may not be equal and the test is shorter which can decrease reliability
- Can use Spearman-Brown equation to compensate for shorter test (rx2/1+r)
9
Q
Cronbach’s coefficient alpha and Kuder-Richardson
A
- Try to rate internal consistency by estimating reliability of all possible split-half combinations by correlating each item with the total and averaging
- Kuder-Richardson used with forced-choice format tests
10
Q
Validity
A
- Validity – the extent to which all the available evidence supports that the test is actually measuring what it is intended to measure
- It is a central requirement for test without which the test items/tasks would not have meaning
o Content validity
• Face validity
o Construct validity
• Criterion-related validity
• Predictive validity
11
Q
Content validity
A
- The content of a test reflects what the test is aiming to measure
- Sometimes content validity is enough validity for a test
- Not enough to ascertain test validity beyond achievement type tests e.g. for more abstract constructs
12
Q
Content Validity Example
A
- Your results on the end of semester exam should reflect what you know about what has been covered in this unit
- So, there should be questions that equally represent the concepts introduced
o E.g. construct underrepresentation; a failure to capture an important aspect or overrepresentation - The questions should be worded in a way that is consistent with the language concepts are taught in
13
Q
Face validity
A
- It refers to the look of the test but maybe superficial
o E.g. the items in the test look as though they ought to measure what you are aiming to measure - Some tests may look valid and not be and others not look valid but are
14
Q
Construct validity
A
- Constructs are theoretically driven ways of talking about certain features in the world
- Construct validity asks how well a test can give a construct meaning
o E.g. anxiety – only exists in so much as the construct represents a set of behaviours, thoughts and feelings
o Construct irrelevance: scores are influenced by something other than what the test is supposed to measure e.g. anxiety or illness impacting exam score
15
Q
Construct validity
A
- Scientific evidence demonstrating that the construct (mode, concept, idea, notion) is actually being measured by the test
- Most important when developing tests to measure abstract constructs like depression, anxiety, happiness, love, empathy
- Measured with statistical tools and methods