Reliability & Validity Flashcards
What is a conceptual variable?
The thing you’re interested in measuring.
What is reliability?
Reliability is the degree to which a measure is free of random error
What is validity?
Validity is the degree to which a measure is free of systematic error
What is a random error?
• Refers to chance variability in measurements
E.g., measure of an individual’s IQ will vary randomly at different times depending on several factors. Such as physical state (being tired one day and not concentrating), errors in marking/recording answers
• Tends to be self-correcting (if you average across the measures it should lean towards the right answers)
• Is not biased in a particular direction
E.g., physical state of participants will vary from day to day but this variation will be random and will tend towards an ‘average’ physical state
What is systemic error?
• Error caused by other conceptual variables
• Tends to be biasing
• E.g., IQ test
• Time of day: Math scores higher in the morning, verbal higher in the afternoon
• Cultural knowledge: people more competent at answering questions from their own culture or in their own native language
• Classic example: US Army IQ test from early 20th century
They wanted to make the smartest people into officers, less smart people could be foot soldiers. They found that people born in US scored > UK > N. Europe > S. Europe > rest of world. However, the questions were biased e.g. “How many people are on a baseball side?”. Moreover, questions were written in English and not translated into other languages!
Briefly explain the difference between random error and systemic error?
Random error is caused by chance and tends to be self-correcting whereas Systematic error is caused by other conceptual variables and there tends to be biasing.
How can we measure reliability?
- The degree to which a measure is free of random error
- Can be measured directly by repeating the test
- E.g., measure reliability of a ruler by using it twice and recording measures
- If it gives consistent results, it is reliable
- If it gives inconsistent results, it is unreliable
What is test-retest reliability and when is it a good method?
- Degree to which scores on the same measured variable correlate on measures taken at different times
- Perfect test-retest reliability lead to a correlation of 1.00 -but, if measures contain random error, correlation will be < 1.00
This is a good method for some measures (e.g., distance, height), but not always for others (e.g., IQ, self-esteem).
What could effect test-retest reliability and how can it be addressed?
- May be affected by reactivity i.e., the effect being observed has on a person
- If the same measures are taken twice (e.g., same questions on self-esteem test or IQ test), then people may show retesting effects
- This means they may remember how they responded the first time and duplicate them, or they might believe a different response is required or simply get bored and stop trying.
• This may be addressed using equivalent forms reliability
E.g. may give questions of equal difficulty in maths test but slightly different. So, the difficulty of questions is identical, but the same question isn’t repeated. For instance, algebra question ‘solve 8 x 2y +7 = 9’ in first test is replaced by ‘solve 4 x 3y + 8 = 4’ in second test.
What is a problem with test-retest reliability and what is it best at assessing?
A problem is that some conceptual variables are not expected to be stable within an individual E.g. although a personality variable such as ‘extraversion’ tends to be relatively stable, a variable such as ‘mood’ can change within the same day. Thus, we can assess test-retest reliability of ‘extraversion’ but not of ‘mood’.
Test-retest reliability is best assessed for things that are stable.
Why do self-reports tend to include a lot of items?
- Self-reports tend to include a lot of items because individual items tend not to have much reliability
- But with more and more items, the combined reliability tends to be higher
- E.g., you can compare this to measuring a coin coming up ‘heads’ with either one flip or a large number of flips. A large number of flips will result in a more reliable answer.
- By the same logic, for a measure of self-esteem, a large number of items will tend to give a more reliable measure than fewer items.
What is true score and how id it related to reliability?
- A true score is an individual’s score on a measure if there was no error
- Actual score = true score + random error
- Reliability is the proportion of the actual score that reflects the true score
What is internal consistency?
- Reflects the extent to which scores on the items correlate with each other (and thus are measuring the same thing)
- E.g., if ten questions assess self-esteem and all ten give similar measures, the questionnaire has a high amount of internal consistency.
- But if the measures are basically random, then there is little internal consistency
What are the tests of internal consistency and how are they used?
Split-half test:
Measures correlation between scores on one half of the items and scores on the other half.
If correlation is low, you still don’t know which items are the problem
Item-to-total correlations:
Measures correlation between individual items and mean score of all items
Cronbach’s coefficient alpha: Measures average correlation between scores for individual items on a scale
What Cronbach’s alpha score would indicate a good/ excellent amount of internal consistency?
Good = 0.8 - 0.9 Excellent = <0.9