Reliability Versus Validity Lecture Dr Wofford Flashcards

1
Q

Test-Retest Reliability

A
  • —Coefficient: test-retest reliability coefficient
  • —Common with self-report survey instruments
    • —ie: a subject takes an identical test on two different occasions under identical testing conditions
  • —Considerations:
    • —Test-retest intervals
    • —Carryover and testing effects

Very important to talk about the Test-retest intervals. Like how long since we last took the test. (like I probably cannot take the cognitive tests Dr. Esmat used in her study for a long time if ever)

Testing effect. Did I do better because I learned the test or because I improved?

Carryover and testing effects.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Internal Consistency

A

A type of Reliability

  • —Generally used with questionnaires, written exams, and interviews (more for qualitative research)
  • —Use correlations among all items in the scale
  • —Want to see some relationship among the items on an exam, interview….as they should measure the same attribute
  • —Reliability coefficient: correlations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Criterion-Related Validity

A
  • —Most practical and objective approach to validity testing
  • —Ability of one test to predict results on an external criterion
  • —High correlation indicates the test is valid based on the external criterion
  • —External criterion must be valid, reliable, independent and free from bias (Must make sure the gold standard is really a gold standard if you are going to use it for this)
    • —May also be called reference standard or gold standard or
  • —Can be tested using concurrent or predictive validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ICC

A
  • —Reliability coefficient: intraclass correlation coefficient (ICC)
  • The reliability coefficient for Rater Reliability tests (both Intra- and Inter- rater)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Relationship between validity and reliability

A
  • —Validity implies that a measurement is relatively free from error
    • —Inherently means that a valid measurement is also reliable
  • —A test can be reliable, but not valid
  • —A test cannot be valid, but not reliable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the validity counterpart to internal reliability?

A

Content Validity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Random Errors

A

—Random errors: Due to chance and can affect scores in unpredictable ways

  • —Decrease random errors= increase reliability
  • —Reliability focuses on amount of random error a measurement has
  • —Example: fatigue
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Three main Types of Reliability

A
  1. —Test-retest reliability: Stability of the measuring instrument
  2. —Rater reliability: Stability of the human observer
    • —Inter-rater versus Intra-rater
  3. —Internal consistency: extent to which items measure various aspects of the same characteristic and nothing extraneous
    • more for when using questionnaires
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Inter-rater reliability

A

Inter-rater reliability: variation between 2+ raters who measure same subject

  • —Best if all raters measure a response during one trial
  • —Ensure blinding of other assessors

—Reliability coefficient: intraclass correlation coefficient (ICC)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Three Sources of Measurement Error

A
  • —Individual taking the measurements
    • —Called tester or rater reliability
  • —Measuring instrument introduces error
  • —Variability of the measured characteristic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Validity: Convergence and Discrimination

A

—Convergent validity: two measures believed to reflect the same underlying phenomenon will have similar results or correlate highly

  • —Implies that the theoretical context behind the construct will be supported when the test is administered to different groups in different places at different times

—Discriminant validity: different results (low correlations) are expected from measures which are believed to assess different characteristics (Discriminant validity is when two measures that measure different things should not correlate well.)

Construct validity is related to convergent and divergent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Four Types of Measurement Validity

A
  1. —Face validity
  2. —Content validity
  3. —Criterion-related validity
    • —Concurrent validity
    • —Predictive validity
  4. —Construct validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

—Systematic errors:

A

Systematic Errors: predictable errors of measurement

  • —Consistently overestimates or underestimates the true score
  • —Constant and biased
  • —More of a problem with validity than reliability

(Systematic error is a reliable error: For example, an uncalabrated scale is always the same amount off each time you measure it. It causes problems with validity but not reliability.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Predictive Validity

A
  • —Establishes that the outcome of a test can be used to predict future score or outcome
    • —ie: GPA used to predict success in PT school or BERG balance test used to predict falls
  • —Criterion and target test are tested at different times

Target test = new test that is untested so far

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Responsiveness to Change

A
  • —Responsiveness: ability of an instrument to detect minimal change over time
  • —Used to assess the effectiveness of interventions
  • —Minimal clinically important difference (MCID): smallest difference in a measured variable that signifies an important difference in a subject’s outcome
  • —Statistical versus clinically meaningful change
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Intra-rater reliability

A

—Intra-rater reliability: stability of data recorded by one individual over 2+ trials

  • —Rater bias: when raters are influenced by their memory of the first score
  • —Best to blind tester

—Reliability coefficient: intraclass correlation coefficient (ICC)

12
Q

Face Validity

A
  • —Least rigorous form of validity
  • —Instrument appears to test what it is supposed to test
    • —ie: ROM, strength, sensation, gait, balance
  • —Considered subjective and scientifically weak
13
Q

Concurrent Validity

A
  • —Establishes validity when two measures are taken at the same time. One measure is used as the gold standard
  • —Both reflect the same incident of behavior
  • —Commonly used with diagnostic or screening tests for determining presence or absence of a disease
  • —Also used with a new or untested measure may be more efficient than a more established method

Concurrent is if the two measures are taken at the same time (gold standard and comparison test) when testing the criterion-related validity

Also used with a new or untested measure (target test) may be more efficient than a more established method – IntegNeuro

15
Q

Measurement error

(include formula)

A

—Relates to Reliability

Measurement error: difference between observed and true scores

  • —Observed score - true score = error
  • —Reliability is an estimation of how much of a measurement represents error and how much is the true score

(The formula she had was observed score = true score – error, but she keeps talking about that measurement error is the difference between observed score ant true scores (so I changed the formula)

17
Q

Reliability

Validity

A

—Reliability: How consistent and free from error is the instrument. —

  • Reliability is an estimation of how much of a measurement represents error and how much is the true score

—Validity: Does the test measure what it intends to measure

19
Q

Reliability Coefficient (formula and interpretation)

A

—Reliability coefficient= True score variance/(true score variance + error variance)

  • —Ranges from 0.00-1.00.
  • These numbers are arbitrary, but they are used in literature. As a researcher, we have to decide what level of reliability is reliable or not.
    • —
    • —.50-.75= moderate reliability
    • —>.75= good reliability

Zero is poor reliability (0% reliable à none is attributal to true difference)

One is the best reliability (100% reliable à all is attributed to true difference)

21
Q

Reliability Coefficient

A

Estimate reliability based on statistical concept of variance

  • —Measure of variability or differences among scores within a sample
  • —Some variance is attributed to true differences among scores and some is attributed to random error
    • Difference among the scores will all of us (if we were going to have a pop quiz today). Random error Variance would be from how much we slept last night. Then results of the test that reflects what we acually remembered (instead of bad answers because of sleep deprivations) would be the true difference Variance.

—Reliability= how much of total variance is attributed to true differences between scores

22
Q

Rater Reliability

A

—Intra-rater reliability: stability of data recorded by one individual over 2+ trials

  • —Rater bias: when raters are influenced by their memory of the first score
  • —Best to blind tester

—Inter-rater reliability: variation between 2+ raters who measure same subject

  • —Best if all raters measure a response during one trial
  • —Ensure blinding of other assessors

—Reliability coefficient: intraclass correlation coefficient (ICC)

Intra-rater –> same person

Inter-rater –> between more than one person

Must be same subject being tested and same setup

Test-retest is looking more at the test (or the person taking the test), whereas Intra-rater reliability would be more about the person giving the test.

Rater bias: see how much they got before and try to get to the same value the second time. (overcome by blinding tester to the outcome of the test).

23
Q

Regression Towards The Mean

A
  • —Observed scores move closer to the mean with repeated tests
  • —A phenomenon which occurs more with outliers and increased random error
  • —Example: extreme high and low scores on a pre-test which may not indicate true knowledge move closer to the class average on the post-test
  • —More of a problem with less reliable test
24
Q

MCID

A

—Minimal clinically important difference (MCID): smallest difference in a measured variable that signifies an important difference in a subject’s outcome

Effect size is being replaced by MCID when calculating power (for power analysis), because we want to know if it will be helpful for our patient.

26
Q

Two Types of errors (not type I or type II):

A

—Systematic errors: predictable errors of measurement

  • —Consistently overestimates or underestimates the true score
  • —Constant and biased
  • —More of a problem with validity versus reliability

—Random errors: Due to chance and can affect scores in unpredictable ways

  • —Decrease random errors= increase reliability
  • —Reliability focuses on amount of random error a measurement has
  • —Example: fatigue
27
Q

Content Validity

A
  • —Adequacy with which the universe (theory) is sampled by a test/instrument
  • —An instrument must cover all content and reflect the relative importance of each item
  • —Commonly used with questionnaires and inventories
    • —ie: a test of gross motor skills should not contain items pertaining to language skills
  • Commonly used with questionnaires and inventories
  • The validity counterpart to internal reliability.
28
Q

What type of data is mean associated with?

A

normally distrubuted data (parametric statistics appropriate)

If we have normally distributed data, we get a bell-shaped curve. If we have a bell shaped curve, we can take the mean (the average of the curve)

If we do not have a bell shape curve, it is not normally distributed and we cannot use the mean. We must take the median.

29
Q

Contstruct Validity

A
  • —Ability of an instrument to measure an abstract concept or construct
  • —Difficult to determine since a construct is abstract
    • —ie: Health, pain
  • —Based on content validity
    • —Must be able to define the content universe that represents the construct

Construct = abstract concept

Have to have content validity to be able to determine construct validity

Related to convergent and divergent – IntegNeuro

30
Q

what type of data is median normally associated with?

A

non-normally distributed data (nonparametric statistics)

If we have normally distributed data, we get a bell-shaped curve. If we have a bell shaped curve, we can take the mean (the average of the curve)

If we do not have a bell shape curve, it is not normally distributed and we cannot use the mean. We must take the median.

If she says mean, we should infer that we are talking about normally distributed data. (parametric statistics appropriate)

If she says median, we should infer that we are talking about non-normally distributed data. (non-parametric statistics appropriate)