Lecture 4.2 Reliability Flashcards

1
Q

Reliability

A

• The consistency with which a test measures what it purports to measure in any given set of circumstances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True

A

True or False

A reliable test will result in the same score every time it is used to measure the same thing under the same conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Reliability coefficient

A

An index of reliability that indicates the ratio between the true score variance on a test and the total variance (SD2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

> .90

A

Reliability coefficient of _______ is excellent for research purposes, appropriate for individual assessment purposes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

> .80

A

Reliability coefficient of _______s good for research purposes, marginal for individual assessment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Reliability coefficient

A
  • higher scores = higher reliability
  • > .6 is marginal for research purposes
  • > .70 is adequate for research purposes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Classic Test Theory

A

assumes that each person has an innate true score. It can be summed up with an equation:
X = T + E,
Real score is true score plus error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

more reliable

A

higher proportion of true variance =

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

less reliable

A

higher proportion of error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

increase or decrease

A

error variance may______________ or _________________ a test score by varying amounts –leading to lower reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Systematic error and unsystematic

A

Two types of testing error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Systematic error

A

Testing error that doesn’t affect reliability. Consistent error, predictable (when aware) – leaking tyre

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Unsystematic error

A

Testing error that effects reliability. Inconsistent, unpredictable – electrical problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Test construction

A

Sources of Error Variance T_______ C_______
The content covered by test items, the way questions are asked, and the response format all add to the error variance of a test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Test administration

A

Sources of Error Variance T_______ A_______
• Test environment (including test materials), test-taker variables (e.g., alertness, wellbeing, mistakes) & administrator-related variables (e.g., presence or absence, demeanour, departure from procedure, unconscious cues, etc.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Test scoring & interpretation

A

Sources of Error Variance T_______ s _______ a ________
Human error - data entry, transcription, coding, calculation, timing, etc.
Level of objectivity/subjectivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Human fallibility

A

Sources of Error Variance h _______ f _________.
• Forgetting or misremembering
• Failing to notice or not being aware
• Not understanding or following instructions
• Under- and over-reporting
• Differences of opinion
• Lying or misleading

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Time and practice effects

A

Sources of Error Variance ti_________ and pr______eff________.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Domain Sampling Model

A

This model assumes that the items that have been selected for any one test are just a sample of items from an infinite domain of potential items. Error that occurs in the development of a test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Domain Sampling Model

A

• Seeks to determine how precisely the test score assesses the domain from which the test draws a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

True score

A

The score you would get if you answered all the items that could be conceivable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Standard Error of Measurement (SEM)

A

• Measures the precision of an observed score & provides an estimate of the amount of error inherent in an observed score or measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Standard Error of Difference (SED)

A

Can be used to compare:
• an individual’s scores on two different tests
• two different people’s scores on the same test
• two different people’s scores on two different tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Test-Retest Reliability

A
  • Calculated by correlating scores from the same people on two different administrations of the same test
  • Used for measuring characteristics that are thought to be stable (e.g. personality traits or intelligence)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

amount of time between administrations

Any interventions, treatment or trauma, taking place between test administrations;

A

Test-retest reliability will be affected by

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Parallel & Alternate Forms Reliability

A

Different versions of a test, matched for content and difficulty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Split-Half Reliability

A

Scores from one half of a test are correlated with the other half of the test, using equivalent halves
• Random, odds & evens, content & difficulty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Inter-Rater Reliability

A

The degree of agreement between two or more scorers. Reduced by appropriate training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Test-retest

A

correlate scores from 2 administrations of the same test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Parallel forms

A

correlate scores from 2 versions of the same test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Split-half

A

correlate scores from 2 equivalent halves of the same test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Internal consistency

A

correlate items within the same test

33
Q

Inter-rater

A

correlate scores from 2 scorers for one test taker

34
Q

reliability coefficients

A

Indicates the ratio between the true score variance on a test and the total variance
Range from 0 to 1: closer to 1, the higher the reliability

35
Q

Homogenous

A

__________________ test unifactorial, so consist of items measuring a single trait or factor

36
Q

Heterogenous

A

________________ test is multifactorial, so measure more than one trait or factor

37
Q

static

A

a characteristic, trait, or ability that is presumed to be relatively unchanging

38
Q

dynamic

A

a characteristic, state, or ability that is presumed to be ever changing as a function of situational and cognitive experiences

39
Q

Restricted range or variance

A

sampling procedure used to gather the test scores does not result in a full spread of scores (e.g., having only university students complete an IQ test)

40
Q

Inflated range or variance

A

when the sample includes people who are outside of the range of the test so the scoring range is inflated (e.g., adults completing a test designed for children)

41
Q

speed test

A

all items of equal difficulty, and time limited so that no-one is likely to be able to answer all items

42
Q

power test

A

time limit is long enough for all items to be attempted, but some items are so difficult that no-one is likely to get them all right

43
Q

Criterion-Referenced

A

Designed to provide an indication of where a test taker stands with respect to some criterion (i.e., pass/fail type tests)

44
Q

Validity

A

The extent to which evidence supports the meaning and use of a psychological test (or other assessment device)

45
Q

The validity coefficient

A

A correlation coefficient that provides a measure of the relationship between test scores and scores on the criterion measure

46
Q

Validity

A

How well a test or measurement tool measures what it purports to measure in a particular context

47
Q

Classic (trinitarian) Model

A

focuses on three categories of validity

48
Q

Content Validity

A

Type of validity - scrutinizing the test’s content

49
Q

Criterion-related validity

A

Type of validity - relating scores obtained on the test to other test scores or other measures

50
Q

Construct validity

A

Type of validity - ‘umbrella validity’; comprehensive analysis of how test scores relate to scores on other tests/measures & how test scores relate to the construct that the test was designed to measure

51
Q

Unitary Model of validity

A

_____________ view takes everything into account, from implications of test scores in terms of societal values to the consequences of use

52
Q

Test validation

A
  • The process of gathering and evaluating validity evidence.
  • Test developer is responsible for supplying validity info in the test manual and/or through a ‘test validation’ journal article
53
Q

Content Validity

A

• Describes a judgement of how adequately a test samples behaviour representative of the universe of behaviour that the test was designed to sample

54
Q

Face Validity

A

Type of content validity

A judgement concerning how relevant the test items appear to be to the test-taker

55
Q

Quantifying content validity

A

Important in employment settings, where tests are used to hire & promote
• Tests must be shown to include relevant items in terms of job skills required for the position
• Lawshe (1975):
• Is the skill or knowledge measured by this item: 1) Essential; 2) Useful but not essential; 3) Not necessary to the performance of the job?

56
Q

Culture

A

C____________ has an impact on judgements concerning the validity of tests and test items

57
Q

Criterion-Related Validity

A

C __________ R________ V __________
A judgement of how adequately a test score can be used to infer an individual’s most probable standing on some measure of interest – the measure of interest being the criterion

58
Q

criterion

A

A _____________ is the standard against which a test or test score is evaluated -can be almost anything:

59
Q
  1. RELEVANT
  2. VALID
  3. UNCONTAMINATED
A

A criterion should be:

  1. R___________ – pertinent or applicable to the matter at hand
  2. V___________ for the purpose for which it is being used
  3. U____________ – not based on predictor measures
60
Q

Predictive Validity

A

P ______________ V ______________ is the degree to which a test score predicts a criterion measure at a future time

61
Q

Concurrent Validity

A

C___________ v_________ is the degree to which a test score is related to a criterion measure that is obtained at (about) the same time

62
Q

Incremental Validity

A

I___________ V__________
The degree to which an additional predictor explains something about the criterion measure that is not explained by predictors already in use

63
Q

False negatives

A

test takers predicted not to show characteristic but do

64
Q

False positives

A

test takers predicted to show characteristic but don’t

65
Q

Miss rate

A

M_____ r_______the proportion of people incorrectly classified

66
Q

Hit rate

A

H________ r_______the proportion of people correctly identified

67
Q

Base rate

A

B______ r________ the extent to which a particular trait, behaviour, characteristic or attribute exists in the population

68
Q

Construct validity

A

C_________ v___________
A judgement about the appropriateness of inferences drawn from test scores regarding individual standings on a variable called a construct

69
Q
Homogeneity of items
Changes with age
Pre-test to post-test changes
Group differences
Convergent evidence
Divergent evidence
Factor analysis
A
Evidence of construct validity
H\_\_\_\_\_\_\_\_\_\_\_\_\_ of items
Changes with a\_\_\_\_
Pre-test to p\_\_\_\_\_\_\_\_\_\_\_\_\_changes
G\_\_\_\_\_\_\_\_ differences
C\_\_\_\_\_\_\_\_\_\_ evidence
D\_\_\_\_\_\_\_\_\_\_ evidence
F\_\_\_\_\_\_\_\_\_ analysis
70
Q

Evidence of homogeneity

A

E__________ of h___________ - How uniform the test is in measuring a single concept

71
Q

Evidence of changes with age

A

Some constructs are expected to change with age, particularly during childhood/adolescence

72
Q

Evidence of pre-test/post-test changes

A

Evidence that scores change as the result of some experience between a pre-test and a post-test can be evidence of construct validity

73
Q

Evidence from distinct groups

A

Demonstrating that scores on the test vary in a predictable way as a function of membership in some group

74
Q

Convergent evidence

A

When test scores on a new test are found to correlate highly in the predicted direction with scores on a older, more established and validated test designed to measure the same construct

75
Q

Discriminant evidence

A

Shown when test scores are found to have little or no relationship with test scores or variables for which theoretically there should be no relationship

76
Q

Factor Analysis

A

Can be used to determine both convergent and discriminant evidence of construct validity

77
Q

Confirmatory Factor Analysis

A

A factor structure is explicitly hypothesised and is tested for its fit with the observed covariance structure of the measured variables

78
Q

Exploratory Factor Analysis

A

Estimating or extracting factors, deciding how many factors to retain, rotating factors to an interpretable orientation