Reliability and Validity Flashcards

1
Q

The two components of measurement

A

measurement = true score + error

(we want as close to true score as possible)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Ways to Reduce Error

A

Many participants

  • there are individual differences error but when it is averaged out among a large group of people it becomes manageable

Many measurements

  • to manage measurement error we add more than one measurement that can achieve similar effects

High frequency or many occasions

  • allowing us to be more confident that what we have measured is true
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Reliability

A

refers to the consistency/repeatability of the results of a measurement

  • ‘How’ reliable a measure is is relative and depends on the situation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Types of Reliability

A

Observers: Inter-Observer reliability

Observations: Internal (Split-half) reliability

Occasions: Test-retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Inter-Observer Reliability

A

→ the degree to which observers agree upon an observation or judgement

  • It can be based on frequency or categorical judgements
  • measured with correlations
  • raters do not rate research together
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Example of Poor Inter-Observer Reliability

A

Rating attractiveness

  • Lacks commonality among different raters and thus there is not strong inter-rater reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Internal/Split-Half Reliability

A

→ the degree to which all of the specific items or observations in a multiple item measure behave the same way

  • High internal reliability shows the entire measure is consistently measuring what it should be

How:

  • divide tests in two halves, then compare first half to second half
  • determine consistency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Example of Measuring Split-Half Reliability

A

Intelligence (split into three domains)
- Verbal intelligence, perceptual reasoning, working memory

  • If we split this in half, the first 50 questions and the last 50 questions, the questions have to be of equal comparison (relating to equal amounts from each domain) (compare ‘like with like’
  • Then we look at the scores from each half and if there is high correlation in results then this indicates good internal reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Test-Retest Reliability

A

→ the reliability of a measure to produce the same results at different points in time or occasions

  • important to show that the test or measure consistently measures the construct we are interested in, provided no other variables have changed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Visual Search Task Example (Test-Retest Reliability)

A

We need the measurement to remain constant over time

However, Practice effects undermine test-retest reliability

  • to counteract this we should counterbalance the order of presentation, such as randomly assigning people to differ orders
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Brain Training Example (Test-Retest Reliability)

A

Things to improve brain, slow down cognitive decline

There is the question of whether it works

  • Adrian Owen (2010) found there to be some improvement, however there was no evidence found for transfer effects to untrained tasks, only applied to that one task (where it should work for other tasks if it improved brain performance, thus it is the effect of practice)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

practice effects

A

improvement on scores in a tasks does not correlate to greater improvement on all tasks

  • indicator of poor retest reliability, incomparable between first time and second time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

replication

A

reliability of results across experiments

  • when variables and conditions stay the same
  • the more times a result is replicated the more likely it is the findings are accurate and not due to error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Validity

A

→ refers to how well a measure or construct actually measures or represents what it claims to
→ relates to accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Types of Validity

A
  • measurement validity (construct, content or criterion validity)
  • internal validity
  • external validity (population or ecological validity)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Measurement Validity

A

how well an operationalised variable corresponds to what it is supposed to measure

  • include construct, content or criterion validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Construct Validity

A

→ how well do your operationalised variables (independent and/or dependent) represent the abstract variables of interest

  • are you measuring what you think/what you say you are measuring

Example: hunger in rats
- must consider the weight of the amount of food consumed, speed running towards food etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Content Validity

A

→ degree to which the items or tasks on a multi-faceted measure accurately measure what its suppose to measure (the target domain)

  • Many constructs are multi-faceted and sometimes multiple measures must be used to achieve adequate content validity

Example: extroversion measure with 40 different items

  • 30 are similar (based on social behaviour, excitement measure, feelings in group settings)
  • 10 are unrelated (favourite shops)
  • We need all domains to accurately measure the construct of interest
  • Critical in psych as many constructs require multi-domain measurements
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Difference between Content Variability and Internal Reliability

A

Content validity demonstrates that all of the items ACCURATELY measure the construct whereas reliability relates to whether items CONSISTENTLY measure the construct

20
Q

Criterion Validity

A

→ measures how well scores on one measure predicts the outcome for another measure

  • two types: concurrent (now) and predictive (future)

concurrent: compares scores on two current measures to determine whether they are consistent (similarity ensures validity)

predictive: predict the outcome of a current behaviour from a separate behaviour (e.g if you see more of something, will you purchase it in the future)

21
Q

Internal Validity

A

→ focussed on whether the research design and evidence allows us to demonstrate a clear causal relationship

  • can never have it in a correlational design
  • occurs when the research design can establish a clear and unambiguous explanation for the relationship of two variables
  • can other explanations be ruled out
  • are the variables accurately measuring the construct
  • does the design support the claim
22
Q

Requirements for Causality

A
  • JS Mills
  1. Covariation - is there evidence for a relationship between the variables
  2. Temporal Sequence - one variable occurs before the other, can I show that one variable comes before another variable, to say something causes something it has to come first
  3. Eliminating Confounds - explain or rule out other possible explanations
23
Q

External Validity

A

→ how well a causal relationship can be generalised/replicated across different people, settings, measurements etc

  • includes population validity and ecological validity
24
Q

Population Validity

A

how well your experimental findings can be replicated in a wider population

  • Aim to have the findings generalise from our experimental sample to the wider population
  • Difficult to obtain high external validity in controlled experimental settings
25
Q

Ecological Validity

A

how well you can generalise the results outside of a laboratory environment to the real world

  • Lab vs real world scenarios
26
Q

Artefacts

A

threats to external validity, things that can only be applicable to the experimental conditions

27
Q

confounds

A

extraneous variable that systematically varies or influences both the independent and the dependent variable

  • otherwise known as a third variable problem
  • may have different confounds in experimental and control groups
28
Q

Accounting for Systematic Confounds

A

The best way to account for these systematic confounds is usually to manipulate the independent variable

  • This is why true experiments are the only type of experiment acceptable for causal claims
  • False experiments are most susceptible to effect of the variable of interest, lacking a control condition, and exposure to the variable of interest
  • Demonstrates the importance control groups and isolating the variable of interest
29
Q

control groups

A

you want to manipulate only the variable of interest between groups

  • Challenge is to keep everything else constant
  • Almost impossible to isolate and remove all other variables
30
Q

Threats to Internal Validity from the Experimenter

A

experimenter bias - confound which undermines the strength of a causal claim

  • May influence the way a dependent variable is scored
  • Can be intentional and unintentional
  • Behaving in a way that influences participants and confounds results
  • Previous knowledge of research may make someone more primed to believe one thing
31
Q

double blind procedures

A

neither experimenter nor participant are aware of the conditions

  • Prevents both unintentional and intentional bias when interpreting the data
32
Q

Threats to Internal Validity from the Participant

A

demand characteristics

  • participants identify the purpose of the study
  • behave in a certain way as a result of identifying the purpose of the study
  • can also relate to how people want to be seen, or the desire to nonconform

individual differences

  • Differences in characteristics
  • When there is a systemic issue with individual difference it is like they are clustering together to create a third variable
  • Impossible to completely remove them so there is a need to control for them
33
Q

Demand characteristics arise when:

A
  • A feature of the study suggests what the purpose of the study is
  • The participant changes their behaviour due to this cue
  • People want to be viewed in good light
34
Q

Surveys and scales are prone to issues with demand characteristics

A
  • People often answer in a positive light
  • Difficult to hide the purpose of the study with questionnaire research
35
Q

Solutions to participant measurement effects

A
  • unobtrusive observation
  • indirect measure
  • deception/confederates
36
Q

Unobtrusive Observation

A
  • watching without them aware
  • hidden camera
37
Q

Indirect Measure

A

Instead of directly measuring with the most obvious dependent variable you use a related variable to answer the same question/get the same results

37
Q

Deception/Confederates

A

People that work for the experimenter pretending to be a participant

38
Q

counterbalancing

A

dealing with time-related confounds

  • Counterbalance order of tests, people in the morning may perform differently than later in the afternoon based on what they have already performed in the day
    -Control for time of day
  • Design experiments of reasonable length
  • Include breaks in the experimental design
39
Q

Artefacts

A

→ reduce external validity

  • Prevents you generalising your results
  • Unlike a confound, an artefact is something that is present in all groups being tested (confounds differ between experimenter and control variables, whereas artefacts affect both of these)
40
Q

Mere Measurement Effect

A

→ being aware that someone is observing or measuring your behaviour may change the way you behave

  • This is important for external validity as it undermines the ability to generalise lab results to wider population and context
  • Similar to demand characteristics, except that it affects all subjects in experiment, not an individual difference variable
41
Q

Hawthorne Effect

A
  • First noticed measurement effect in humans
  • 30% increase in productivity in all of the different conditions
  • Occurred because they all measured them on the same day, all the workers knew the day before it happened and so they changed productivity when they showed up
42
Q

History Effects

A

→ effect of a period of time may make an entire sample biased

-The data is influenced by the moment in time

  • Can’t generalise these findings to a wider population or different contexts
43
Q

Selection Bias

A

→ participants volunteering for a study who have a biassed interest in the topic of research or the outcome of the study

  • People who like beards will join the study asking whether beards are attractive
44
Q

Non-Response Bias

A

→ problem for experiments that involve voluntary sign ups

  • People who do not respond when they are not interested in something, You lose a large sample of the population to non-response bias
  • This undermines external validity of experiment, Limited population means the results cannot be generalised
45
Q

How to Manage Selection Bias

A
  • Use a random sample of the population
  • Does not eliminate all problems but reduces the likelihood of systematic biases in your data
  • Reasons for compulsory polls include that this reduces sampling bias, produces results and data that is more widely applicable, it is less susceptible to bias groups, it is still susceptible to demand characteristic