Reliability & Validity Flashcards

1
Q

What is a conceptual variable?

A

The thing you’re interested in measuring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is reliability?

A

Reliability is the degree to which a measure is free of random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is validity?

A

Validity is the degree to which a measure is free of systematic error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a random error?

A

• Refers to chance variability in measurements
E.g., measure of an individual’s IQ will vary randomly at different times depending on several factors. Such as physical state (being tired one day and not concentrating), errors in marking/recording answers

• Tends to be self-correcting (if you average across the measures it should lean towards the right answers)

• Is not biased in a particular direction
E.g., physical state of participants will vary from day to day but this variation will be random and will tend towards an ‘average’ physical state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is systemic error?

A

• Error caused by other conceptual variables
• Tends to be biasing
• E.g., IQ test
• Time of day: Math scores higher in the morning, verbal higher in the afternoon
• Cultural knowledge: people more competent at answering questions from their own culture or in their own native language
• Classic example: US Army IQ test from early 20th century
They wanted to make the smartest people into officers, less smart people could be foot soldiers. They found that people born in US scored > UK > N. Europe > S. Europe > rest of world. However, the questions were biased e.g. “How many people are on a baseball side?”. Moreover, questions were written in English and not translated into other languages!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Briefly explain the difference between random error and systemic error?

A

Random error is caused by chance and tends to be self-correcting whereas Systematic error is caused by other conceptual variables and there tends to be biasing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How can we measure reliability?

A
  • The degree to which a measure is free of random error
  • Can be measured directly by repeating the test
  • E.g., measure reliability of a ruler by using it twice and recording measures
  • If it gives consistent results, it is reliable
  • If it gives inconsistent results, it is unreliable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is test-retest reliability and when is it a good method?

A
  • Degree to which scores on the same measured variable correlate on measures taken at different times
  • Perfect test-retest reliability lead to a correlation of 1.00 -but, if measures contain random error, correlation will be < 1.00

This is a good method for some measures (e.g., distance, height), but not always for others (e.g., IQ, self-esteem).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What could effect test-retest reliability and how can it be addressed?

A
  • May be affected by reactivity i.e., the effect being observed has on a person
  • If the same measures are taken twice (e.g., same questions on self-esteem test or IQ test), then people may show retesting effects
  • This means they may remember how they responded the first time and duplicate them, or they might believe a different response is required or simply get bored and stop trying.

• This may be addressed using equivalent forms reliability
E.g. may give questions of equal difficulty in maths test but slightly different. So, the difficulty of questions is identical, but the same question isn’t repeated. For instance, algebra question ‘solve 8 x 2y +7 = 9’ in first test is replaced by ‘solve 4 x 3y + 8 = 4’ in second test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a problem with test-retest reliability and what is it best at assessing?

A

A problem is that some conceptual variables are not expected to be stable within an individual E.g. although a personality variable such as ‘extraversion’ tends to be relatively stable, a variable such as ‘mood’ can change within the same day. Thus, we can assess test-retest reliability of ‘extraversion’ but not of ‘mood’.

Test-retest reliability is best assessed for things that are stable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why do self-reports tend to include a lot of items?

A
  • Self-reports tend to include a lot of items because individual items tend not to have much reliability
  • But with more and more items, the combined reliability tends to be higher
  • E.g., you can compare this to measuring a coin coming up ‘heads’ with either one flip or a large number of flips. A large number of flips will result in a more reliable answer.
  • By the same logic, for a measure of self-esteem, a large number of items will tend to give a more reliable measure than fewer items.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is true score and how id it related to reliability?

A
  • A true score is an individual’s score on a measure if there was no error
  • Actual score = true score + random error
  • Reliability is the proportion of the actual score that reflects the true score
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is internal consistency?

A
  • Reflects the extent to which scores on the items correlate with each other (and thus are measuring the same thing)
  • E.g., if ten questions assess self-esteem and all ten give similar measures, the questionnaire has a high amount of internal consistency.
  • But if the measures are basically random, then there is little internal consistency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the tests of internal consistency and how are they used?

A

Split-half test:
Measures correlation between scores on one half of the items and scores on the other half.
If correlation is low, you still don’t know which items are the problem

Item-to-total correlations:
Measures correlation between individual items and mean score of all items

Cronbach’s coefficient alpha: Measures average correlation between scores for individual items on a scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What Cronbach’s alpha score would indicate a good/ excellent amount of internal consistency?

A
Good = 0.8 - 0.9
Excellent = <0.9
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What Cronbach’s alpha score would indicate a poor/ unacceptable amount of internal consistency?

A
Poor = 0.5 - 0.6
Unacceptable = >0.5
17
Q

What is interrater reliability?

A

• When a behaviour is analysed, we typically use a number of judges to score that behaviour

• Works on the principle that lots of measurements are more reliable than fewer measurements
E.g. assessing prosocial behaviour in children
E.g. assessing verbal communication in brain damaged patients

  • Interrater reliability refers to the degree to which different judges give similar scores for the same behaviour/event
  • E.g., judging number of acts of kindness committed by children in a nursery
  • The idea is that the closer different judges scores are, the more interrater reliability there is
18
Q

What is construct validity?

A

Construct validity refers to the extent to which a measured variable actually measures the conceptual variable.
e.g., a written IQ test has validity only if given to people who are able to see, read and write in the language the test is given, etc.

19
Q

How is validity assessed?

A
  • Face validity
  • Internal validity
  • External validity
  • Content validity
  • Convergent validity
  • Discriminative validity
  • Criterion validity
20
Q

What is face validity?

A

• A subjective impression of validity using common sense
• E.g., a self-report of someone’s mood has high face validity (high first impression)
• However, measuring the size of someone’s shoes as an indicator of mood has low face validity
• High face validity is not enough as sometimes an item with high face validity can actually have low construct validity
E.g., a test of psychosis: Rate from 1 (strongly agree) to 7 (strongly disagree) – “Voices in my head should be obeyed at all times” may have high face validity but…
No-one who wants to avoid being labelled a psychotic will answer ‘strongly agree’

21
Q

What is internal validity?

A
  • Internal validity refers to the extent to which a measure is free of systematic bias
  • A systematic bias in a measure will always weaken internal validity
  • E.g., If a bathroom scale adds thirty stone to your weight, it has a large systematic bias (not to mention a bad effect on your self-esteem!)
  • Such a bathroom scale has low internal validity
22
Q

What is external validity?

A

• External validity refers to the extent to which a measure can be generalised to other situations
• Is weakened when the effect of the IV depends on other factors
E.g., if you’re testing a drug treatment for depression and find it works well to treat severely depressed people you cannot assume from this it will work on those who are only mildly depressed. Thus, such a study has low external validity.

23
Q

What is content validity?

A

• This is particularly appropriate for tests of abilities
• It measures extent to which measured variable represents an appropriate sample
E.g., a test of verbal skills that focuses on creative writing, but excludes comprehension, vocabulary and grammar, has little content validity as a measure of overall verbal skills.
(But has a lot of content validity as a test of creative writing).

24
Q

Why are convergent and discriminative validity more useful than face and content validity?

A

Face and content validity are useful but tend to be subjective. Validity can (and should) also be measured objectively based on data.

25
Q

What is convergent validity?

A

Convergent validity refers to the extent to which a measured variable relates to other measured variables designed to measure the same conceptual variable.
E.g., a person’s scores on different IQ tests should correlate highly with each other (Someone getting really high scores in 4/5 tests but getting a low score in another).

26
Q

What is discriminative validity?

A
  • Discriminative validity is the extent to which a measured variable is found to be unrelated to other measured variables designed to assess different conceptual variables
  • E.g., a self-report test of colour preference should have little correlation with a self-report test of job satisfaction. If they did correlate, that would suggest low discriminative validity.
  • E.g., a test that purports to measure creativity should not correlate with a test that measures height
  • In order to improve the discriminative validity we need to consider the way a test fits into the big picture, considering other tests, other types of evaluations, etc.
  • The network of these associations is known as the nomological net.
27
Q

What is criterion validity?

A

• Idea to associate a self-report measure with a more reliable behavioural measure
E.g., comparing scores on a self-report test of anxiety to physiological measures of anxiety (e.g., blood pressure, heart rate, respiration rate, etc.)

  • In this context, the physiological measure is called a criterion variable and the correlation between it and the self-report measures is known as criterion validity
  • It’s known as predictive validity when it is used to predict future events/behaviours
  • It’s known as concurrent validity when it involves the assessment of a self-report and a behavioural measure at the same time
28
Q

How can you optimise reliability and validity?

A
  1. Conduct a pilot experiment (start off with a smaller number of participants than you would plan for the full experiment and use this as a way of measuring the validity and reliability)
  2. Use multiple measures (the more you do the measure, the more reliable it will be e.g. the more times you flip a coin)
  3. Ensure variability within measures (give people a range of multiple outcomes, the measures need to distinguish between different types of people)
  4. Phrase items in a clear and unambiguous way
  5. Try to ensure participants take the experiment seriously (as the experimenter you should act in a professional manner)
  6. Be conscious of face and content validity
  7. Whenever possible, rely on existing measures (often there are already existing validating scales to test anxiety, mood, self-esteem etc)