Module 2: Reliability Flashcards

1
Q

Reliability

A

+ dependability or consistency of the instrument or scores obtained by the same person when re-examined with the same test on different occasions, or with different sets of equivalent items
+ Free from errors
+ Minimizing error
+ True score cannot be found

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If tests are reliable, are they automatically reliable in all contexts?

A

No. Test may be reliable in one context, but unreliable in another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can reliability be computed?

A

Estimate the range of possible random fluctuations that can be expected in an individual’s score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How many items should there be to have higher reliability?

A

The higher/greater the number of items, the higher the reliability will be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What kind of sample should be used to obtain an observed score?

A

Using only representative sample to obtain an observed score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Reliability Coefficient

A

index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Classical Test Theory (True Score Theory)

A

score on an ability tests is presumed to reflect not only the testtaker’s true score on the ability being measured but also the error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Error

A

+ refers to the component of the observed test score that does not have to do with the testtaker’s ability
+ Errors of measurement are random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the formula of the classical test theory?

A

X = T + E

X - observed behavior
T - true score
E - error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can the true score be computed?

A

When you average all the observed scores obtained over a period of time, then the result would be closest to the true score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a factor that contributes to consistency?

A

stable attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are factors that contribute to inconsistency?

A

characteristics of the individual, test, or situation, which have nothing to do with the attribute being measured, but still affect the scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the goals of reliability?

A
  1. To estimate errors
  2. Devise techniques to improve testing and reduce errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Variance

A

useful in describing sources of test score variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the two types of variance?

A
  1. True Variance
  2. Error Variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

True Variance

A

variance from true differences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Error Variance

A

variance from irrelevant random sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Measurement Error

A

+ all of the factors associated with the process of measuring some variable, other than the variable being measured
+ difference between the observed score and the true score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Positive Variance

A

can increase one’s score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Negative Variance

A

decrease one’s score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the sources of error variance?

A
  1. Item Sampling/Content Sampling
  2. Test Administration
  3. Test Scoring and Interpretation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Item Sampling/Content Sampling

A

+ refer to variation among items within a test as well as to variation among items between tests
+ the extent to which testtaker’s score is affected by the content sampled on a test and by the way the content is sampled is a source of error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Test Administration

A

testtaker’s motivation or attention, environment, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Test Scoring and Interpretation

A

may employ objective-type items amenable to computer scoring of well-documented reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Random Error

A

source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in measurement process (e.g., noise, temperature, weather)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Systematic Error

A

+ source of error in a measuring a variable that is typically constant or proportionate to what is presumed to be the true values of the variable being measured
+ has consistent effect on the true score
+ SD does not change, the mean does

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the relationship between reliability and variance?

A

+ Reliability refers to the proportion of total variance attributed to true variance
+ The greater the proportion of the total variance attributed to true variance, the more reliable the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What can error variance do to a test score?

A

Error variance may increase or decrease a test score by varying amounts, consistency of test score, and thus, the reliability can be affected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

True Score Formula

A

Rxx (x - [x with the dash on top] + [x with the dash on top]

wherein

Rxx - correlation coefficient
x - obtained score
x with the dash on top - mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is an error in test-retest reliability?

A

time sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Test-Retest Reliability

A

+ an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is test-retest reliability appropriate for?

A

appropriate when evaluating the reliability of a test that purports to measure an enduring and stable attribute such as personality trait

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How is test-restest reliability established?

A

established by comparing the scores obtained from two successive measurements of the same individuals and calculating a correlated between the two set of scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

When does the reliability coefficient of test-retest reliability become insignificant?

A

the longer the time passes, the greater likelihood that the reliability coefficient would be insignificant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Carryover Effects

A

happened when the test-retest interval is short, wherein the second test is influenced by the first test because they remember or practiced the previous test = inflated correlation/overestimation of reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Practice Effect

A

scores on the second session are higher due to their experience of the first session of testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Test Sophistication

A

items are remembered by the test takers especially the difficult ones/items that we got highlight confused

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Test Wiseness

A

might inflate the abilities of test takers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

When does test-retest reliability have lower correlation?

A

test-retest with longer interval might be affected of other extreme factors, thus, resulting to low correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What does low correlation in test-retest reliability mean?

A

lower correlation = poor reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Mortality

A

problems in absences in second session (just remove the first tests of the absents)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What does test-retest reliability measure?

A

coefficient of stability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What are the statistical tools that should be used for test-retest reliability?

A

Pearson R, Spearman Rho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What are the errors in Parallel Forms/Alternate Forms Reliability?

A

Item Sampling (Immediate), Item Sampling changes over time (delayed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Parallel Forms/Alternate Forms Reliability

A

+ established when at least two different versions of the test yield almost the same scores
+ has the most universal applicability
+ true scores must be the same for two tests
+ means and the varianes of the observed scores must be equal for two forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Parallel Forms

A

each form of the test, the means, and the error variances are EQUAL; same items, different positionings/numberings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Alternate Forms

A

simply different version of a test that has been constructed so as to be parallel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is required of parallel forms/alternate forms reliability

A

The test should contain the same number of items and the items should be expressed in the same form and should cover the same type of content; range and difficulty must also be equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What is required of parallel forms/alternate forms reliability?

A

The test should contain the same number of items and the items should be expressed in the same form and should cover the same type of content; range and difficulty must also be equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What should be done if there is a test leakage during parallel/alternate forms reliability?

A

If there is a test leakage, use the foem that is not mostly administered.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Counterbalancing

A

technique to avoid carryover effects for parallel forms, by using different sequence for groups (e.g. G1 - listen to song before counseling, G2 - counseling first, before listening to the song)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Counterbalancing

A

technique to avoid carryover effects for parallel forms, by using different sequence for groups (e.g. G1 - listen to song before counseling, G2 - counseling first, before listening to the song)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

When can the two different tests for parallel forms/alternate forms reliability be administered?

A

It can be administered on the same day or different time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What is the most rigorous and burdensome form of reliability?

A

Parallel forms/alrernate forms because test developers create two forms of the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What is the main problem for parallel form/alternate form reliability?

A

There is a difference between the two tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

What are the factors that may affect parallel form/alternate form reliability test scores?

A

It may be affected by motivation, fatigue, or intervening events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

What are the factors that may affect parallel form/alternate form reliability test scores?

A

It may be affected by motivation, fatigue, or intervening events.

58
Q

What are the statistical tools for parallel form/alternate form reliability?

A

Pearson R or Spearman Rho

59
Q

What are the statistical tools for parallel form/alternate form reliability?

A

Pearson R or Spearman Rho

60
Q

What is Internal Consistency also known as?

A

Inter-Item Reliability

61
Q

What is an error of Internal Consistency?

A

Item Sampling Homogeneity

62
Q

Internal Consistency (Inter-Item Reliability)

A

+ used when tests are administered once
+ consistency among items within the test
+ measures the internal consistency of the test which is the degree to which each item measures the same construct
+ measurement for unstable traits

63
Q

When can a test be said to have good internal consistency?

A

This is if all items measure the same construct, then it has a good internal consistency

64
Q

What is internal consistency most useful for?

A

useful in assessing Homogeneity

65
Q

Homogeneity

A

if a test contains items that measure a single trait (unifactorial)

66
Q

Heterogeneity

A

degree to which a test measures different factors (more than one factor/trait)

67
Q

When will a test have higher inter-item consistency?

A

more homogenous items = higher inter-item consistency

68
Q

What are the different statistical tools that may be used for computing Internal Consistency?

A

+ KR-20
+ KR-21
+ Cronbach’s Coefficient Alpha

69
Q

KR-20

A

used for inter-item consistency of dichotomous items (intelligence tests, personality tests with yes or no options, multiple choice), unequal variances, dichotomous scored

70
Q

KR-21

A

used if all the items have the same degree of difficulty (speed tests), equal variances, dichotomous scored

71
Q

Cronbach’s Coefficient Alpha

A

used when two halves of the test have unequal variances and on tests containing non-dichotomous items; unequal variances

72
Q

Average Proportional Distsance

A

measure used to evaluate internal consistencies of a test that focuses on the degree of differences that exists between item scores

73
Q

What is an error of Split-Half Reliability?

A

Item Sample; Nature of Split

74
Q

Split-Half Reliability

A

obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered ONCE

75
Q

What is split-half reliability useful for?

A

it is useful when it is impractical or undesirable to assess reliability with two tests or to administer a test twice

76
Q

How can split-half reliability be done?

A

One cannot just divide the items in the middle because it might spuriously raise or lower the reliability coefficient, so just randomly assign items or assign odd-numbered items to one half and even-numbered items to the other half

77
Q

What are the different statistical formulas that may be used for computing Split-Half Reliability?

A

+ Spearman-Brown Formula
+ Spearman-Brown Prophecy Formula
+ Rulon’s Formula

78
Q

Spearman-Brown Formula

A

allows a test developer of user to estimate internal consistency reliability from a correlation of two halves of a test, if each half had been the length of the whole test and have the equal variances

79
Q

Spearman-Brown Prophecy Formula

A

estimates how many more items are needed in order to achieve the target reliability

80
Q

How is Spearman-Brown Prophecy Formula computed?

A

multiply the estimate to the original number of items

81
Q

Rulon’s Formula

A

counterpart of spearman-brown formula, which is the ration of the variance of difference between the odd and even splits and the variance of the total, combined odd-even, score

82
Q

What should the developer do if the split-half reliability is relatively low?

A

If the reliability of the original test is relatively low, then developer could create new items, clarify test instructions, or simplifying the scoring rules

83
Q

What are the statistical tools that may be used to compute split-half reliability?

A

Pearson R or Spearman Rho

84
Q

What is the error of Inter-Scorer Reliability?

A

Scorer Differences

85
Q

Inter-Scorer Reliability

A

+ the degree of agreement or consistency between two or more scorers with regard to a particular measure
+ evaluated by calculating the percentage of times that two individuals assign the same scores to the performance of the examinees

86
Q

Variation of Inter-Scorer Reliability

A

a variation is to have two different examiners test the same client using the same test and then to determine how close their scores or ratings of the person are

87
Q

What is Inter-Scorer Reliability most used for?

A

used for coding nonbehavioral behavior/factors

88
Q

What are statistical measures that may be used for Inter-Scorer Reliability?

A

+ Fleiss Kappa
+ Cohen’s Kappa
+ Krippendorff’s Alpha

89
Q

Fleiss Kappa

A

determine the level between TWO or MORE raters when the method of assessment is measured on CATEGORICAL SCALE

90
Q

Cohen’s Kappa

A

two raters only

91
Q

Krippendorff’s Alpha

A

two or more rater, based on observed disagreement corrected for disagreement expected by chance

92
Q

Dynamic

A

trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experience

93
Q

Static

A

barely changing or relatively unchanging

94
Q

Restriction of Range or Restriction of Variance

A

if the variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be lower

95
Q

Power Tests

A

when time limit is long enough to allow test takers to attempt all times

96
Q

Speed Tests

A

generally contains items of uniform level of difficulty with time limit

97
Q

What kind of reliability should be used for speed tests?

A

Reliability should be based on performance from two independent testing periods using test-retest and alternate-forms or split-half-reliability

98
Q

Criterion-Referenced Tests

A

designed to provide an indication of where a testtaker stands with respect to some variable or criterion

99
Q

What will happen to the traditional measure of reliability when individual differences decerase?

A

As individual differences decrease, a traditional measure of reliability would also decrease, regardless of the stability of individual performance

100
Q

Classical Test Theory

A

+ states that everyone has a “true score” on a test
+ made up of “true score” and random error

101
Q

True Score

A

genuinely reflects an individual’s ability level as measured by a particular test

102
Q

Domain Sampling Theory

A

+ estimates the extent to which specific sources of variation under defined conditions are contributing to the test scores
+ considers problem created by using a limited number of items to represent a larger and more complicated construct
+ test reliability is conceived of as an objective measure of how precisely the test score assesses the domain from which the test draws a sample
+ Systematic Error

103
Q

Generalizability Theory

Domain Sampling Theory

A

+ based on the idea that a person’s test scores vary from testing to testing because of the variables in the testing situations
+ according to generalizability theory, given the exact same conditions of all the facets in the universe, the exact same test score should be obtained (universe score)

104
Q

Universe

A

the test situation

105
Q

Facet

A

number of items in the test, amount of review, and the purpose of test administration

106
Q

Decision Study

A

developers examine the usefulness of test scores in helping the test user make decisions

107
Q

Item Response Theory

A

+ the probability that a person with X ability will be able to perform at a level of Y in a test
+ a system of assumption about measurement and the extent to which item measures the trait

108
Q

What is the focus of Item Response Theory?

A

item difficulty

109
Q

What is Item Response Theory also known as?

A

Latent-Trait Theory

110
Q

Computer using IRT

A

+ The computer is used to focus on the range of item difficulty that helps assess an individual’s ability level
+ If you got several easy items correct, the computer will then move to more difficult items

111
Q

Difficulty

A

attribute of not being easily accomplished, solved, or comprehended

112
Q

Discrimination

A

degree to which an item differentiates among people with higher or lower levels of the trait, ability, etc.

113
Q

Dichotomous

A

can be answered with only one of two alternative responses

114
Q

Polytomous

A

3 or more alternative responses

115
Q

Standard Error of Measurement

A

+ provide a measure of the precision of an observed test score
+ index of the amount of inconsistent or the amount of the expected error in an individual’s score
+ allows to quantify the extent to which a test provide accurate scores
+ used to estimate or infer the extent to which an observed score deviates from a true score
+ Standard Error of a Score

116
Q

What is the basic measure of error (SEM)?

A

Standard deviation of error

117
Q

What does the SEM provide?

A

provides an estimate of the amount of error inherent in an observed score or measurement

118
Q

What does it mean when a test has lower SEM?

A

Higher reliability

119
Q

What is SEM used for?

A

Used to estimate or infer the extent to which an observed score deviates from a true score

120
Q

Confidence Interval

Standard Error of Measurement

A

+ a range or band of test scores that is likely to contain true scores
+ tells us the relative ability of the true score within the specified range and confidence level

121
Q

What does it mean when the range is larger?

A

The larger the range, the higher the confidence

122
Q

Standard Error of the Difference

Standard Error of Measurement

A

can aid a test user in determining how large a difference should be before it is considered statistically significant

123
Q

Standard Error of Estimate

Standard Error of Measurement

A

refers to the standard error of the difference between the predicted and observed values

124
Q

What can one do if the reliability is low?

A

If the reliability is low, you can increase the number of items or use factor analysis and item analysis to increase internal consistency

125
Q

Reliability Estimates

A

nature of the test will often determine the reliability metric

126
Q

Types of Reliability Estimates

A

a) Homogenous (unifactor) or heterogeneous (multifactor)
b) Dynamic (unstable) or static (stable)
c) Range of scores is restricted or not
d) Speed Test or Power Test
e) Criterion or non-Criterion

127
Q

Test Sensitivity

A

detects true positive

128
Q

Test Specificity

A

detects true negative

129
Q

Base Rate

A

proportion of the population that actually possess the characteristic of interest

130
Q

Selection ratio

A

no. of hired candidates compared to the no. of applicants

131
Q

Formula for Selection Ratio

A

number of hired candidates / total number of candidates

/ = divided by

132
Q

Four Possible Hit and Miss Outcomes

A
  1. True Positives (Sensitivity)
  2. True Negatives (Specificity)
  3. False Positive (Type 1)
  4. False Negative (Type 2)
133
Q

True Positives (Sensitivity)

A

predict success that does occur

134
Q

True Negatives (Specificity)

A

predict failure that does occur

135
Q

False Positive (Type 1)

A

success does not occur

136
Q

False Negative (Type 2)

A

predicted failure but succeed

137
Q

Quartile 1

A

scored well, performed poorly

138
Q

Quartile 2

A

scored well, performed well

139
Q

Quartile 3

A

scored well, performed poorly

140
Q

Quartile 4

A

scored poorly, performed poorly