Lecture 2 Validity & Reliability (Catherine) Flashcards

To provide an overview of the content of Lecture 2

1
Q

Distinguish between precision, accuracy, reliability & validity in relation to measuring instruments

A

An instrument has:

  • Precision if it has fineness of discrimination
  • Accuracy if it gives the correct value and has no systematic bias
  • Reliability if the instrument has measurement stability, with no random substantial fluctuations
  • Validity if it measures what it proports to measure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name the key types of Reliability

A
  • Test-retest Reliability: Correlating pairs of scores on 2 different administrations of the same test
  • Internal Consistency Reliability: split-half testing Cronbach (nondichotomous items) Kuder & Richardson (dichotomous items)
  • Inter-scorer Reliability: The degree of agreement between scorers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the key challenges of test reliability?

A
  • Stability over time?
  • Internal consistency
  • Test scores are made up of the true score plus error
  • There is always variability in test scores as a result of error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What prevents scores being stable over time?

A

-Stability over time: central problem is the interpretation of individual scores changes when the test is administered multiple times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does internal consistency mean?

A

The extent to which a psychological test is homogenous (tests one item) or heterogeneous (tests more than one item)
- DASS tests depression, anxiety, & stress is therefore heterogeneous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the types of error that are included in the final test score (Test scores are made up of the true score plus error)?

A

-Test Construction Item or content sampling
-Test Administration (environment, test-taker & examiner related variables)
-Test Scoring & Interpretation
(Hand scoring or subjective judgements)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the main problem with Test Construction?

A

-Systematic error: an ambiguously asked question could be interpreted differently by 2 people
-Errors in Item or content sampling
-Random error: e.g. Catherine is morning person, Donna is afternoon person if we both have our exam in the morning, Catherine has an advantage
NB: Can use Alternate Forms to identify source of error and internal consistency for fatigue as source of error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the main problems with Test Administration?

A

-Inconsistent environmental factors (e.g. air con vs no air con)
-Test-taker (individual differences not taken into account like age)
-Examiner related error (fatigue, boredom, etc)
NB: Can use Test-Retest to identify source of error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the main problems with Test Scoring & Interpretation?

A
  • Hand scoring open to error
  • Subjective judgements
  • Computer aided scoring cannot be used for qualitative data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

List the different forms of Reliability Estimates

A
  • Test-Retest Reliability
  • Parallel Forms Reliability
  • Alternate Forms Reliability
  • Internal Consistency Reliability using
  • Split-Half Reliability
  • Chronbach’s alpha
  • Kuder-Richardson
  • Inter-Scorer Reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which source of Error Variance does Test-Retest Reliability attempt to account for?

A

Test-Retest Reliability Testing attempts to account for Errors in Test Administration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the important considerations to successfully apply Test-Retest Reliability Testing?

A
  • The Test is taken twice and the results are correlated
  • It is important to have an appropriate amount of time between tests (this will vary depending on the type of test - e.g. MSE needs 18 months,)
  • Systematic Changes should not affect the scores (e.g. everyone in a cold room)
  • Unpredictable changes will affect the correlation (such as?)
  • A reliable test will be able to sustain greater levels of fluctuation (what does that mean?)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the factors that affect Test-Retest Reliability?

A
individual differences, 
experience, 
practice effects, 
memory, 
fatigue, 
motivation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which source of Error Variance does Parallel Forms or Alternative Forms Reliability Tests attempt to account for?

A

Parallel Forms or Alternative Forms Reliability Tests attempt to account for errors in Test Construction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When would a test administrator implement a Parallel Forms or Alternative Forms Reliability Test?

A

In a situation where it is not possible to conduct a Test-Retest Reliability test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In What ways is a Parallel Forms or Alternative Forms Reliability Test similar to a test-retest reliability test?

A
  • In both cases the participant completes two tests

- The aim of both is to minimise error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are Parallel Forms Reliability Tests?

A

Parallel forms of a test exist when for each form of the test the means and variances of observed test scores are equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are Alternate Forms Reliability Tests?

A

Alternate forms are simply different forms of a test that have been constructed to be parallel. They are designed to be equivalent with regard to content and level of difficulty, but do not meet the same stringent criteria as parallel forms (so means & variances have not been made equivalent)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the main draw back with Alternate Form Tests of Reliability?

A

Because the means and variances have not been made equivalent (as they have in parallel form) it leads to highly ambiguous test confounds: as now have two sources of error: Time and Content, whereas with parallel forms just time as a confound.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What methods can be employed to achieve internal consistency reliability?

A

Split-half reliability testing can be employed to achieve internal consistency reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the main considerations when implementing Split-half reliability testing?

A

-Ensure the split is in a meaningful way i.e. not first-last half of test (fatigue effects)
better to do odd-even split
-if its heterogeneous test ensure this is also split in meaningful way

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What statistical analysis does a test administrator employ to assess Split-Half reliability of a homogeneous test?

A

A test administrator can obtain a correlation coefficient of a homogeneous test using a Spearman-Brown Formula
-The Spearman-Brown in effect converts the split-half test into a full test.
It cannot be used for a heterogeneous test!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What statistical analysis does a test administrator employ to assess Split-Half reliability of a heterogeneous test?

A

A test administrator can test the internal consistency reliability of any heterogeneous split-half test using a Chronbach’s Alpha Formula
-The Chronbach’s Alpha is a generalised reliability coefficient for scoring systems that are graded by each item.
(With the DASS we would need a Chronbach’s alpha for each trait measured)
It cannot be used for either a homogeneous or a heterogeneous test But NOT dichotomous answers (yes/no; true/false)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What statistical analysis does a test administrator employ to assess Split-Half reliability of a dichotomous test?

A

A test administrator can test the internal consistency reliability of any dichotomous split-half test using a Kuder-Richardson Formula
The Kuder-Richardson essentially provides an estimate of all test-retest or split-half coefficients for yes/no or true/false answers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are acceptable Reliabilities for Clinical and Research situations?

A

Acceptable reliabilities for Clinical settings is:
r > 0.85 acceptable
Acceptable reliabilities for Research settings is: r > ~0.7 acceptable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are the Internal Consistency and Test-Retest Reliabilities of the WAIS and MMPI?

A

Internal Consistency of WAIS: r = 0.887
Internal Consistency of MMPI: r = 0.84
Test-Retest Reliabilities of the WAIS: r = 0.82
Test-Retest Reliabilities of the MMPI: r = 0.74
NB: WAIS test-retest is just outside acceptable limits
the MMPI is suseptable to change over time as its a personality inventory used on clinical patients whoa re more likely to change over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What type of reliability would a test administrator be assessing if they utilised a Kuder-Richardson, Chronbach’s Alpha or Spearman-Brown?

A

The Test Administrator would be assessing a tests Internal Consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

A Correlation coefficient can be used to check all other types of reliability except Internal Consistency. What are these types of reliability?

A
  • Test-Retest Reliability
  • Alternate Form Reliability
  • Inter-scorer Relaibility
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What factors does a Test Administrator need to bear in mind when measuring reliability?

A
  • Is the test measuring state or trait? (trait is more enduring)
  • The range of possible responses (ideally 5-7 responses - 0 - 10 is not ideal as people tend to cluster around the middle)
  • Speeded tests - towards end of test test taker may not have had time to attempt a number of items, does not mean they would have been incorrect, did not get time to answer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

There are seven methods utilised to improve reliability, what are they?

A
  • Quality of items (need to be clear, concise, homogeneous)
  • Ensure consistent testing conditions
  • Reduce Test-Retest time intervals
  • Longer assessments
  • Develop a robust scoring plan
  • Test items for reliability & adapt the measure
  • Ensure Validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

There are 3 classes of Validity, what are they?

A
  • Internal Validity
  • External Validity
  • Test Validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is Internal Validity interested in?

Relevant to Experimental Validity

A

Confidence in making causal statements about study outcomes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is External Validity interested in?

Relevant to Experimental Validity

A

Confidence you can generalise results to people outside of the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is Test Validity interested in?

Relevant to this Unit!!!!

A

Confidence that what you are measuring truly represents what you think you are measuring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What are the 3 forms of assessing test validity?

A
  1. Content Validity
  2. Criterion-Related Validity
  3. Construct Validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

There are 3 traditional measures of test validity, name them

A
  1. Content Validity
  2. Criterion-Related Validity
  3. Construct Validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

There are 3 methods to assess each of these forms of validity, what are they?

A
  1. Scrutinise test contents
  2. Comparing Scores on this test to other tests
  3. Perform an analysis of how scores on this test relate to scores on other tests and theories
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

There is another form of validity, Face Validity, what is it?

A

Face Validity relates to whether the person being assessed believes the test appears to measure what is actually being measured

39
Q

Which is the most important form of validity?

A

Construct Validity

40
Q

What are the implications for low face validity

A
  • indirectly tests some aspect not perceived by the test-taker (e.g. MMPI asks about icecream as part of a personality assessment)
  • may result in negative consequences such as poor test taker attitude, or disgruntlement
  • Some tests have low face validity and others have high face validity
41
Q

What does Content Validity assess?

A

Content Validity Scrutinises the test’s content

42
Q

What is Content Validity concerned with?

A

Content validity is concerned with how well does each item on the test measure what it intends to measure?

  • Tests should capture all aspects of the target behaviour
  • e.g. for HR test items should directly relate to the job role we are hiring for
43
Q

How do we measure Content Validity?

A

-We use a Content Validity Ratio (CVR - Lawshe, 1975)
-We ask N number of experts to rate each item to reflect performance on an item in terms of essential, useful, not necessary
-We remove items based on % of people who state the item is not necessary
If less than 5% say necessary/essential it should be removed

44
Q

What does Criterion-Related Validity assess?

A

Criterion-Related Validity relates scores obtained on the current test to other test scores or other measures

45
Q

What is Criterion-Related Validity & what are the 2 varieties of Criterion-Related Validity?

A

Criterion-Related Validity is a judgement of how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest.
The two varieties of Criterion-Related Validity are:
*Concurrent Validity
*Predictive Validity

46
Q

What is Criterion-Related Validity concerned with?

A

Criterion-Related Validity is interested in how well the test item reflect an individual’s actual score on the criterion of interest

47
Q

What are the types of Criterion-Related Validity?

A

Concurrent Criterion-Related Validity
-The degree to which the score relates to the criterion measure at that time (measure a new test against a gold standard)

Predictive Criterion-Related Validity
-Degree to which the score relates to a criterion measure in the future (i.e. uses regression to predict a person’s future reading ability)

48
Q

What is Concurent Criterion-related Validity?

A

Concurrent Validity is an index of the degree to which a test score is related to some criterion masuree obtained at the same time

49
Q

What are the important considerations when assessing Criterion-Related Validity?

A
Is the criterion:
-Relevant
-Valid & Reliable
-Uncontaminated
We need to ensure the test we are comparing is relevant, valid, reliable & uncontaminated, hence we use gold standard tests rather than just random other tests
50
Q

How does one measure Concurrent Validity?

A

By performing a correlation and comparing how well the outcome of the new test compares with the outcome of a well-known, reliable, well-validated test

51
Q

What is Predictive Criterion-Related Validity?

A

Predictive Validity is an index of the degree to which a test score predicts some criterion measure (in the future)

52
Q

How does one measure Predictive Validity?

A

By obtaining test scores now, and criterion measure in the future using multiple predictors.
-considering the validity coefficient in the context of corresponding issues including
-incremental validity and expectancy data
NEEDS work - not clear

53
Q

What is the validity coefficient?

A

The Validity coefficient is a correlation that provides a measure of the relationship between test scores and scores on the criterion measure.

54
Q

What is incremental validity?

A

Uses more than one predictor

  • Additional predictors used in ascertaining criterion-related predictive validity should possess incremental validity.
  • That is the degree to which an additional predictor explains something about the criterion measure that is not explained already by predictors already in use.
55
Q

What is Expectancy Data?

A

Expectancy Data provides useful information to evaluate the criterion-related validity of a test.
Using a score obtained from one test or measure expectancy tables illustrate the likelihood that the test-taker will score within some interval of scores (such as pass or fail).

56
Q

How does one create an Expectancy Table?

A

An expectancy table can be created by a scatterplot according to
An expectancy table shows the relationship between scores on e.g. high school exams and relationship to university grade

57
Q

What is Construct Validity concerned with?

A

Construct Validity is concerned with how well inferences drawn from a test score relate to current theories or knowledge (i.e. constructs)

58
Q

What are the 6 sources of evidence for Construct Validity?

A
  1. Evidence of Homogeneity
  2. Evidence of Changes with Age
  3. Evidence of Distinct Groups
  4. Convergent Evidence**
  5. Discriminant Evidence**
  6. Factor Analysis
59
Q

What issues does one need to be aware of when considering Evidence of Homogeneity, one of the 6 sources of evidence for Construct Validity?

A
  • How uniform the test is for measuring a single concept
  • Correlate sub-sections with the whole test score
  • Item analysis
  • How important is homogeneity?
60
Q

What does Construct Validity Measure?

A

Construct Validity executes a comprehensive analysis of how scores on the test:

a. relate to other scores and measures
b. can be understood within some theoretical framework for understanding the construct that the test was designed to measure

61
Q

What are the 13 factors which can negatively affect validity?

A

-Unclear directions
-Ambiguity in question terminology
-Inadequate time limits
-Inappropriate level of difficulty
-Poorly constructed test items
Test items are inappropriate for planned test outcomes
-Tests that are too short
-Improper arrangement of items
-Identifiable patterns of answers
-Administration and Scoring
-Nature of the Criterion
-Bias
-Fairness

62
Q

Bias is one of the 13 factors which can negatively affect validity, what specific bias/biases are important?

A
  • Test biased towards a certain population
  • Implies a systematic variation in results
  • Slope versus Intercept Bias
  • Rating error - overcome by ranking
  • Halo effect
63
Q

What is a Criterion, and what essential properties does a criterion require?

A

A Criterion is the standard against which a test or a test score is evaluated.
*The Criterion needs to be relevant, valid and uncontaminated.

64
Q

Define the characteristics of a criterion

A
  • An adequate criterion is relevant, i.e. it is pertinent or applicable to the matter in hand.
  • Evidence should exist that supports the validity of the criterion
  • A criterion should be uncontaminated, so it should be based, at least in part, on predictor measures. If not, validation study cannot be taken seriously. There is no formal test for criterion contamination
65
Q

Fairness is one of the 13 factors which can negatively affect validity, what specific aspects of fairness are important?

A
  • Age, Culture, Gender
  • Adjustment to scores - is this fair?
  • Psychometric techniques for reducing adverse impact of unfairness to some groups
66
Q

What method for quantifying content validity was put forward by C.H. Lawshe in 1975?

A

The method developed by Lawshe gauged agreement among raters/judges regarding how essential a particular item is.

i. e. is the skill or knowledge measured by this item:
- essential
- useful (but not essential)
- unnecessary

67
Q

What Provides a measure of Concurrent Validity?

A

If test scores are obtained about the same time that the criterion measures are obtained, measures of the relationship between the test scores & the criterion provide evidence of concurrent validity.

68
Q

What provides an indication of predictive validity?

A

Measures of the relationship between test scores & a criterion measure obtained at some future time provide an indication of the predictive validity of the test, that is, how accurately scores on the test predict some criterion measure

69
Q

What statistical evidence is used to make judgements of criterion-related validity (either concurrent or predictive)?

A

Two types of statistical evidence is used:

  • the validity coefficient and
  • expectancy data
70
Q

What does the validity coefficient provide?

A

The Validity coefficient is a correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure.
*The Pearson Correlation coefficient is typically used to determine the validity between the 2 measures.

71
Q

What variables negative influence the validity coefficient?

A

restriction or inflation of range, a key issue being whether the range of scores employed is appropriate to the objective of the correlational analysis

72
Q

How high shoudl a validity coefficient be for a user or a test developer to infer that the test is valid?

A

Cronbach & Gleser cautioned against the establishment of such a rule, it simply should be high enough to result in the identification & differentiation of test-takers with regard to target ability.

73
Q

What is incremental validity?

A

Incremental validity is the degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use.
Each measure used as a predictor should have criterion-related predictive validity, possess incremental validity & only be included if they demonstrate something not covered by existing predictors

74
Q

What type of information is provided by expectancy data and expectancy tables?

A

Expectancy data provides information that can be used in evaluating the criterion-related validity of a test.
*Expectancy tables illustrate the likelihood that the test-taker will score within some interval of scores on a criterion measure (e.g. pass/fail)

75
Q

Name two renowned expectancy table developers

A
  • Taylor-Russell (1939, 1973, 1974) - seven steps to an expectancy table
  • Naylor-Shine Tables (1965)
76
Q

List the strengths and weaknesses of the Taylor-Russell expectancy tables

A
  • 7 step procedure was provided
  • The table can assist in judging the utility of a test by determining the increase over current procedures

Limitation:

  • The relationship between predictor and criterion must be linear
  • It is difficult to identify the cut off for successful vs unsuccessful using the table
77
Q

What are the strengths and limitations of the Naylor-Shine tables?

A
  • No need for linear relationship as uses average criterion scores to compare
  • Obtaining the difference between the means of the selected & unselected groups to derive an index of what the test is adding to already established procedures
  • Identifies the utility of a test by determining the increase in average score on some criterion measure
78
Q

What do the Taylor-Russell & Naylor-Shine tables have in common?

A

With both tables the validity coefficient used must be one obtained by Concurrent Validation procedures

79
Q

What is the most often-cited application of statistical decision theory in the field of psycholgical testing, &, what are it’s 4 key points?

A

Cronbach & Gleser’s Psychological Tests & Personnel Decision (1957, 1965)

  1. a classification of decision problems
  2. various selection strategies ranging from single-stage to sequential analyses
  3. a quantitative analysis of the relationship between test utility, the selection ratio, costing of testing, expected value of outcomes &
  4. a recommendation that in some instances job requirements be tailored to the applicants abilities instead of the other way around
80
Q

Define Construct Validity

A

Construct Validity is a judgment about the appropriateness of inferences drawn from test scores regarding individual standings on a variable called a construct.
*Constructs are observable, presupposed or underlying traits that a test developer may invoke to describe test behaviour or criterion performance

81
Q

If a test is a valid measure of the construct (i.e. it has high construct validity) what results will the test developer observe?

A

High scorers and low scorers will behave as predicted by the theory

NB Construct validity has been viewed as the unifying concept for all validity evidence

82
Q

What are some reasons results might behave contrary to those predicted?

A
  • The test simply does not measure the construct

* The theory is sound, but the statistical procedures or their execution was flawed

83
Q

List the 5 procedures which provide evidence for construct validity

A
  • The test is homogeneous, measuring a single construct
  • Test scores increase or decrease as a function of age, the passage of time, or an experimental manipulation as theoretically predicted
  • Test scores obtained after some event or time (i.e. post-test scores) differ from pre-test scores as theoretically predicted
  • Test scores obtained by people from distinct groups vary as predicted by the theory
  • Test scores correlate with scores on other tests in accordance with what would be predicted from a theory that covers the manifestation of the construct in question
84
Q

What is covergent evidence as it relates to construct validity?

A

Evidence for the construct validity of a particular test may converge from a number of sources, e.g. other tests designed to assess the same or similar construct. Thus, if scores on the test undergoing construct validation tend to correlate highly in the predicted direction with scores on older, more established, already validated tests, this is convergent evidence

85
Q

What is Discriminant Evidence in relation to construct validity?

A

A validation coefficient showing little (i.e. a statistically insignificant) relationship between test scores &/or other variables with which scores on the test being construct-validated should NOT theoretically be correlated this provides Discriminant Evidence or discriminant validity

86
Q

What Statistical method is employed to evidence convergent or discriminant construct validity?

A

Factor Analysis

87
Q

What are the key points of Lawshe’s rating process, which he termed Content Validity Ratio (CVR)?

A
  • If more than half the panelists indicate the item is essential, it has at least some content validity.
  • Lawshe recommended that if the amount of agreement observed is more than 5% likely to occur by chance, then the item should be eliminated
    1. Negative CVR: fewer than half the panelists indicate essential
    2. zero CVR: exactly half the panelists indicate essential
    3. Positive CVR: More than half, but not all the panelists indicate essential
88
Q

What are unavoidable issues for all test developers?

A

That errors in measurement exist in all test.

  • these errors affect both reliability and validity
  • the test developers goal is to reduce / minimise error
89
Q

what is intercept bias?

A

If a test systematically under predicts or over predicts the performance of a particular group with respect to a criterion, then it exhibits intercept bias.
Intercept bias is a term derived from the point where the regression line intersects the Y-Axis

90
Q

What is slope bias?

A

If a test systematically yields significantly different validity coefficients for members of different groups, then it has a slope bias
Slope bias is named as the slope of one group’s regression line is different in a statistically significant way from the slope of another group’s regression line.

91
Q

What is a rating error and what are the types of rating errors?

A

A rating error is a judgment resulting from the intentional or unintentional misuse of a rating scale.

  • Leniency or generosity error (too generous)
  • Severity error (too harsh)
  • Central Tendency error (sticks to the middle)
  • Halo Effect (high ratings in all things due to raters failure to discriminate)
92
Q

What is the overall goal of test development?

A

To obtain consistent results that truly reflect the concepts we are trying to obtain.

93
Q

Define fairness, as it applies to psychometric testing

A

Fairness, in a psychometric context is the extent to which a test is used in an impartial, just, and equitable way.

94
Q

Name another variable that can influence all aspects of test construction, including test validitation?

A

The influence of Culture extends to judgements concerning validity of tests and test items