Week 3 Reliability and Validity Flashcards

1
Q

Define reliability

A

The degree to which a test tool produces consistent results (when measuring the same thing).

e.g. the scale which measures a consistent weight each time is considered reliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define validity

A

The extent to which a test measures the construct it is intended to measure. e.g. a scale measures weight nothing else, an IQ test measures intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why are reliability and validity important?

A
  • diagnosis
  • assessment of ability
  • treatment decisions and monitoring outcomes
  • research
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True or false, tests can be reliable without being valid.

A

TRUE

tests can consistently produce the same results but not accurately measure what you want them to.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

True or false, tests can not be valid but still be reliable

A

FALSE

Tests cannot be valid without being reliable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe classical test theory (Charles Spearman)

A

States that test scores are the result of:
- factors which contribute to consistency - stable under examination (“True Scores”)

  • factors which contribute to inconsistency - characteristics of test taker, or situation that are not related to the characteristic being tested (errors of measurement/ confounders)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the formula for test theory?

A

X = T + e

X= obtained score
T= true score
e= errors of measurement 

e.g. Anxiety score on test =(true) anxiety + error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

List the different sources of error

A

Item selection

Test administration

Test scoring

Systematic measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the following source of error: item selection

A

sample of items chosen may not be reflective of every individual’s true score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe the following source of error: test administration

A

general environmental conditions e.g. temperature, lighting, noise, states/mood of the test taker

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe the following source of error: test scoring

A

Subjectively scored tests e.g. projective tests and essay exams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe the following source of error: systematic measurement error

A

test may consistently tap into something other than the attribute being tests

e.g. test of introversion may actually test aspects of social anxiety without knowing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain domain sampling theory

A

Central concept in classical test theory;

With Domain Sampling, tests are constructed by randomly selecting a specified number of measures from a homogeneous, infinitely large pool.

A sample of items is reliable to the extent that the score it produces correlates highly with these true scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Are longer tests more reliable?

A

Technically, yes because according to domain sampling theory, these tests will include more items from the “universe” of possible domains thus testing more aspects of an item.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are two elements of reliability that are observed/tested?

A

Stability over time - extent to which test remains stable when it is administered on more than one occasion

Internal consistency - extent to which a psychological test is homogenous or heterogenous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe the test-retest (stability) measure of evaluating reliability.

A

same test administered to same group twice at two different time points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are considerations/ limitations for the test-retest measure?

A
  • consider strong correlations between 2 scores
  • consider time lapse between test administrations
  • practice effects, maturation, treatment effects/ setting all impact scores
18
Q

Test-retest is an appropriate measure for I___ and E___

It is inappropriate for S___ A__ and W__ of a b_

A

Intelligence and Extraversion (stable over time)

State anxiety and weight of a baby

19
Q

Describe the parallel or alternative forms measure of evaluating reliability.

A

two forms of the same test developed; different items selected according to the same rules. e.g. alternative exam for PSY3041

20
Q

please select one of the two options:

  • same
  • different

parallel forms have ___ distribution across scores (means and variance equal)

A

same

21
Q

please select one of the two options:

  • same
  • different

alternate forms have ___ distribution of scores

A

different

means and variance may not be equal

22
Q

What are the similarities between parallel and alternate forms of reliability?

A
  • both are matched for content and difficulty
  • stable construct required
  • two tests administered to the same group (looking for strong correlations between the versions)
  • influenced by changes between testing times e.g. fatigue
  • additional source of error: item sampling/ slightly diff items
23
Q

Describe the split half method of evaluating reliability.

A

test is divided into halves which are compared (randomly split, odd-even system or top vs bottom).

rationale: if scores on 2 half tests from single administration are highly correlated, scores on 2 whole tests from seperate administration should be highly correlated
- estimates of reliability will be smaller because smaller number of items

24
Q

What is the purpose of the Spearman-Brown formula/ correction?

A

As the reliability based on the split half is smaller due to a smaller number of items, the Spearman-Brown formula is applied to estimate reliability if each half of the test was the same length as the test.

-internal consistency tested

25
Q

Which reliability coefficient is used to measure internal consistency?

A

Cronbach’s alpha - a generalised reliability coefficient for scoring systems that are graded for each item.

  • it is the mean of ALL possible split-half correlations, corrected by the spearman-brown formula
  • ranges from 0 (no similarity ) to 1 (perfectly identical).
26
Q

What are acceptable levels of reliability?

A

.70 - .80 acceptable or good

greater than .90 may indicate redundancy of items –> high reliability is important in clinical settings when making decisions e.g. decision making capacity assessment

27
Q

What is the standard error of measurement? (SEM)

A

allows for the estimation of the precision of a specific (individual) test score.

The larger the SEM the less certain we are that the test score represents the true test score.
- Confidence intervals (CI) are often used

28
Q

A way of conceptualising validity is:

A test is valid to the extent that inferences made from it are, a___, m___ and u__.

A

A test is valid to the extent that inferences made from it are, APPROPRIATE , MEANINGFUL and USEFUL.

  • validity must be relevant to the CONTEXT and POPULATION in which you are measuring a construct.
29
Q

What are the types of validity evidence?

A
  • Face validity
  • Content validity
  • Criterion related validity
    Predictive evidence
    Concurrent Evidence
  • Construct validity
    Convergent Evidence
    Discriminent evidence
30
Q

Explain face validity and how it’s measured

A

Does the test look like it measures the relevant construct?

  • social acceptability issue
  • must be an obvious link between construct and test times
  • assessed using test-taker’s opinion
31
Q

Explain content validity and how it’s measured

A

The extent to which the items on a test represent the universe of behaviour the test was designed to measure

e.g. anxiety test examines all aspects of anxiety not just affect

  • sampling issue
  • commonly used in vocational settings/ achievement
  • assessed by experts opinion on subject manner - logical deduction
32
Q

What are some issues for content-related validity

A

Construct underrepresentation
- failure to capture an important component of a construct e.g. depression scale which only measures emotions and thoughts but not behaviour

Construct- irrelevant variance

  • measuring things other than desired construct
  • e.g. wording of scale may cause people to answer in a socially desirable way
33
Q

Explain criterion-related validity

A

extent to which a measure is related to an outcome (criterion)

e.g. high school marks used to predict university performance or relationship satisfaction used to predict separation

34
Q

What is concurrent evidence? (criterion-related validity)

A

a comparison between measure in question and an outcome assessed at the same time

35
Q

What is concurrent evidence? (criterion-related validity)

A

How well a test predicts performance on a criterion. Compares measure in question with an outcome assessed at a later time. e.g. ATAR score used to predict uni marks

36
Q

What is construct validity?

A

A multi-faceted process concerned with establishing how well a test measures a psychological construct.

37
Q

What is convergent evidence? (construct validity)

A

The degree which two constructs which should be theoretically related are actually related. e.g. relationship between low self esteem and depression

  • correlate test scores between two the two measures
38
Q

What is discriminant evidence? (construct validity)

A

a.k.a divergent evidence

aims to demonstrate that the test is unique. Low correlations should be observed with constructs which are unrelated to what the test is trying to measure

  • also want to discern between similar but different constructs e.g. self-esteem and self efficacy
    e. g. scores on anxiety measure should be different from depression if both are being assessed within the same test.
39
Q

What is factor analysis ?(construct validity)

A

observe pattern using factor analysis.

  • some items within a test may be highly related and form a set; whereas others may not be related to these and form a different set.
40
Q

Explain the two methods of factor analysis

A

Exploratory factor analysis
- don’t know how many underlying constructs/ clusters will be formed

Confirmatory factor analysis
- pre-developed test e.g. DASS, analysis occurs to confirm that there are actually multiple factors being measured