Lecture 12: Reliability and Validity Flashcards

1
Q

Observed vs latent constructs

A

Constructs (recap) = an abstract feature of interest within a population, such as intelligence, perseverance, or education

Observed constructs = constructs that can be measured directly (e.g., height, weight, age, number of visits to the gym).

Latent constructs = constructs that are measured indirectly (e.g., through observed indicators, which can be questions): attitudes, opinions (inside a participant’s head). These constructs require an operational definition, for example to determine what can be considered “succesful” or extrovert. Together, several observed indicators help us capture the underlying latent construct.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Measurement error

A

The difference between “true score” and lower response

E.g., the question is “Do you enjoy talking to people?”
This response can be affected by random factors (e.g., a sleepless night), or the questionnaire itself => everyone will interpret this question differently, and since extraversion is a broad term, one’s extraversion can easily be overestimated or underestimated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Contra-indicative items

A

Statements whose wording is aligned with the construct are indicative.

Statements whose wording is opposed to the construct are contra-indicative OR reverse coded.
For example, if you measure deceitfulness, a question like “Honesty is the best policy in all cases” is an example of a reverse-coded question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sum scores

A

Xsum = ∑ki=1 Xi
∑ki=1 = sum of the individual item scores
Xi = all items
Every question is given a value (e.g., 1-3). After all the questions are answered, the values are added up to give a sum score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Limitations of sum scores

A

The total number of the sum depends on the number of items. If any person has a missing value for one of the items, they cannot get any points for that item => therefore, they will score lower, even if that may not be true in practice. This limitation can be addressed by calculating a mean score, where the value does not depend on the number of items and missing values do not result in a lower score.

However, 2 limitations remain:
1) Each item is still considered to be equally important
2) Measurement error is still ignored
3) Mean scores are somewhat better than sum scores, but not perfect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Reliability

A

Does the instrument consistently measure the same thing?

Test-retest reliability: are individuals’ scores similar across multiple occasions? (E.g., is your score on extraversion still the same in a month)

Internal consistency: are scores across different questions similar for the same individual?

Inter-rate reliability: do different people report the same score for the same thing?
E.g., when you ask 2 classmates to rate you on a scale of extroversion, do they answer the same?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Limitations of test-retest reliability

A

Learning effects: after participants have been exposed to your questionnaire once, they can react to it differently the second time

Memory effects => participant’s scores change the same because they remember the question, not because their characteristics have necessarily stayed the same

People change over time => therefore, you should find an interval that is just long enough for participants to be minimally affected by learning and memory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Internal consistency

A

Measures the association among items within a test. It can be estimated using methods such as split halves or Cronbach’s Alpha.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Split halves

A

1) Split the test in two halves
2) Correlate scores of first half with second half
3) Apply correction to estimate reliability of entire test based on correlation between split halves
4) r’ = 2r / 1 - r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cronbach’s Alpha

A

Estimates internal consistency (if you are measuring the same thing across all your questions).

Looks at the number of items multiplied by the average covariance between items, divided by the average variance of items.

If you want to calculate Cronbach’s Alpha, contra-indicative items must be reverse coded.

Alpha increases with number of items + when items are more similar (this might lower content validity if our items are very similar).

Rules of thumb:

> . 90 Excellent
.70 Acceptable
. 60 Questionable
< 50 Unacceptable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Item-total correlation

A

We could compute a total scale score by adding or averaging item responses. Each item ought to measure the same construct as this total score. We can then observe the correlation between each item and the total score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Validity

A

Does the instrument measure what it intends to measure?

Face validity: at first glance, does the instrument appear to assess the correct construct? Is the wording clear, readable, understandable and unambiguous?
E.g., does “I enjoy swimming” measure extroversion?

Content validity: does the test cover all aspects of the construct?
Example of poor content validity: early intelligence tests. White Americans scored highest => white spread racial discrimination caused people from lower socio-economic backgrounds to score lower on these tests.

Criterion validity: is the test associated with outcomes or indicators of the construct it is designed to measure?
A scale should correlate with another validated scale, a behavioural measure of the construct (GPA correlates with intelligence), or an outcome of the construct (altruism correlates with giving to charity).
=> test of extraversion should predict number of friends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Psychometrics

A

Has the aim to measure an underlying latent construct using multiple observed indicators.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly