Reliability and Validity Flashcards
(39 cards)
What is reliability
Reliability refers to the consistency/repeatability of results of a measurement. “How” reliable something is relative and depends on the situation.
Types of reliability
- Observers: Inter-Observer reliability
- Observations: Internal (Split-half) reliability
- Occasions: Test-retest reliability
What is inter-observer reliability?
Inter-observer reliability is the degree to which observers agree upon an observation or judgement
• Can be frequency or categorical judgement
How is inter-observer reliability tested?
Measure by looking at shared relationships between observers (i.e. correlation). Cohen’s kappa, Pearson’s correlation coefficient (r), etc. depending on whether continuous or discrete
What is internal reliability?
Internal reliability is the degree to which specific items/observations in a multiple item measure behave the same way. I.e. are they measuring the same thing?
How is internal reliability tested?
Tested by dividing test into 2 halves, then looking at correlation between them. If it has high internal reliability, an individual’s performance on the first half should correlated with the second half.
What is test-retest reliability?
Test-retest reliability is the extent to which scores on a test/measure remain stable over time.
What is validity?
Validity refers to how well a measure or an
operationalised variable corresponds to what it is supposed to measure/represent.
Types of validity
- Internal validity
- External validity
- Population validity
- Ecological validity
- Construct validity
- Content validity
- Criterion validity
Internal validity
• How convincing is the evidence for causality in a study/series of experiments?
• i.e. how strong is the inference that the independent
variable and the dependent variable are causally
related?
J.S. Mill: 3 requirements to establish causality
- Covariation
- Temporal Sequence
- Eliminating confounds (rival explanations/hypotheses)
- Third-variable problem
Equifinality
Another threat to internal validity. Most things have multiple causes .
External validity
How well does a causal relationship hold across different people, settings, treatment variables, measurements and time. Two types: Ecological and population validity.
Population validity
- Making cross-cultural inferences from Western, Educated, Industrialized, Rich, Democratic samples?
- Differences in tasks ranging from motivation, reasoning and even visual perception
- Muller-Lyer Illusion: Americans vs. the San people of the Kalahari
Ecological Validity
How well do results of laboratory experiments generalise to real-life settings?
• E.g. aggression studies in the lab vs. in real life
• Bandura (1961, 1963): Bobo doll experiment
Construct validity
How well do your operationalized variables
(independent and/or dependent) represent the
hypothetical or abstract variables of interest
Content validity
Degree to which the items or tasks adequately sample the
target domain
• i.e. how well does a measure/task represent all the facets of a construct
Criterion validity
To what extent can a procedure be used to infer or predict some criterion (outcome)
Two types: concurrent and predictive
Concurrent validity
A type of criterion validity - to what extent can a procedure be used to infer some criterion
Predictive validity
A type of criterion validity - to what extent can a procedure be used to predict some criterion
When do person confounds occur?
Person confounds occur when a variable seems to cause something because people who are high or low on this variable also happen to be high or low on some individual difference variable (e.g. demographics characteristic) that is associated with the outcome variable of interest.
When do operational confounds occur?
Operational confounds occur when a measure designed to assess a specific construct such as depression, memory, or foot size inadvertently measures something else as well.
What is a confound?
A threat to internal validity, that undermines a causal explanation
What is an artifact?
A threat to external validity, a by-product of testing procedure or sample that biases all results.
Unlike confounds, artifacts stay constant and are present in all groups