Module 2: Reliability Flashcards

1
Q

Reliability

A

+ dependability or consistency of the instrument or scores obtained by the same person when re-examined with the same test on different occasions, or with different sets of equivalent items
+ Free from errors
+ Minimizing error
+ True score cannot be found

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If tests are reliable, are they automatically reliable in all contexts?

A

No. Test may be reliable in one context, but unreliable in another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How can reliability be computed?

A

Estimate the range of possible random fluctuations that can be expected in an individual’s score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How many items should there be to have higher reliability?

A

The higher/greater the number of items, the higher the reliability will be.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What kind of sample should be used to obtain an observed score?

A

Using only representative sample to obtain an observed score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Reliability Coefficient

A

index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Classical Test Theory (True Score Theory)

A

score on an ability tests is presumed to reflect not only the testtaker’s true score on the ability being measured but also the error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Error

A

+ refers to the component of the observed test score that does not have to do with the testtaker’s ability
+ Errors of measurement are random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the formula of the classical test theory?

A

X = T + E

X - observed behavior
T - true score
E - error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can the true score be computed?

A

When you average all the observed scores obtained over a period of time, then the result would be closest to the true score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a factor that contributes to consistency?

A

stable attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are factors that contribute to inconsistency?

A

characteristics of the individual, test, or situation, which have nothing to do with the attribute being measured, but still affect the scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the goals of reliability?

A
  1. To estimate errors
  2. Devise techniques to improve testing and reduce errors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Variance

A

useful in describing sources of test score variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the two types of variance?

A
  1. True Variance
  2. Error Variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

True Variance

A

variance from true differences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Error Variance

A

variance from irrelevant random sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Measurement Error

A

+ all of the factors associated with the process of measuring some variable, other than the variable being measured
+ difference between the observed score and the true score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Positive Variance

A

can increase one’s score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Negative Variance

A

decrease one’s score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the sources of error variance?

A
  1. Item Sampling/Content Sampling
  2. Test Administration
  3. Test Scoring and Interpretation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Item Sampling/Content Sampling

A

+ refer to variation among items within a test as well as to variation among items between tests
+ the extent to which testtaker’s score is affected by the content sampled on a test and by the way the content is sampled is a source of error variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Test Administration

A

testtaker’s motivation or attention, environment, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Test Scoring and Interpretation

A

may employ objective-type items amenable to computer scoring of well-documented reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Random Error
source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in measurement process (e.g., noise, temperature, weather)
26
Systematic Error
+ source of error in a measuring a variable that is typically constant or proportionate to what is presumed to be the true values of the variable being measured + has consistent effect on the true score + SD does not change, the mean does
27
What is the relationship between reliability and variance?
+ Reliability refers to the proportion of total variance attributed to true variance + The greater the proportion of the total variance attributed to true variance, the more reliable the test
28
What can error variance do to a test score?
Error variance may increase or decrease a test score by varying amounts, consistency of test score, and thus, the reliability can be affected
29
True Score Formula
Rxx (x - [x with the dash on top] + [x with the dash on top] wherein Rxx - correlation coefficient x - obtained score x with the dash on top - mean
30
What is an error in test-retest reliability?
time sampling
31
Test-Retest Reliability
+ an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the test
32
What is test-retest reliability appropriate for?
appropriate when evaluating the reliability of a test that purports to measure an enduring and stable attribute such as personality trait
33
How is test-restest reliability established?
established by comparing the scores obtained from two successive measurements of the same individuals and calculating a correlated between the two set of scores
34
When does the reliability coefficient of test-retest reliability become insignificant?
the longer the time passes, the greater likelihood that the reliability coefficient would be insignificant
35
Carryover Effects
happened when the test-retest interval is short, wherein the second test is influenced by the first test because they remember or practiced the previous test = inflated correlation/overestimation of reliability
36
Practice Effect
scores on the second session are higher due to their experience of the first session of testing
37
Test Sophistication
items are remembered by the test takers especially the difficult ones/items that we got highlight confused
38
Test Wiseness
might inflate the abilities of test takers
39
When does test-retest reliability have lower correlation?
test-retest with longer interval might be affected of other extreme factors, thus, resulting to low correlation
40
What does low correlation in test-retest reliability mean?
lower correlation = poor reliability
41
Mortality
problems in absences in second session (just remove the first tests of the absents)
42
What does test-retest reliability measure?
coefficient of stability
43
What are the statistical tools that should be used for test-retest reliability?
Pearson R, Spearman Rho
44
What are the errors in Parallel Forms/Alternate Forms Reliability?
Item Sampling (Immediate), Item Sampling changes over time (delayed)
45
Parallel Forms/Alternate Forms Reliability
+ established when at least two different versions of the test yield almost the same scores + has the most universal applicability + true scores must be the same for two tests + means and the varianes of the observed scores must be equal for two forms
46
Parallel Forms
each form of the test, the means, and the error variances are EQUAL; same items, different positionings/numberings
47
Alternate Forms
simply different version of a test that has been constructed so as to be parallel
48
What is required of parallel forms/alternate forms reliability
The test should contain the same number of items and the items should be expressed in the same form and should cover the same type of content; range and difficulty must also be equal
49
What is required of parallel forms/alternate forms reliability?
The test should contain the same number of items and the items should be expressed in the same form and should cover the same type of content; range and difficulty must also be equal
50
What should be done if there is a test leakage during parallel/alternate forms reliability?
If there is a test leakage, use the foem that is not mostly administered.
51
Counterbalancing
technique to avoid carryover effects for parallel forms, by using different sequence for groups (e.g. G1 - listen to song before counseling, G2 - counseling first, before listening to the song)
52
Counterbalancing
technique to avoid carryover effects for parallel forms, by using different sequence for groups (e.g. G1 - listen to song before counseling, G2 - counseling first, before listening to the song)
53
When can the two different tests for parallel forms/alternate forms reliability be administered?
It can be administered on the same day or different time.
54
What is the most rigorous and burdensome form of reliability?
Parallel forms/alrernate forms because test developers create two forms of the test.
55
What is the main problem for parallel form/alternate form reliability?
There is a difference between the two tests
56
What are the factors that may affect parallel form/alternate form reliability test scores?
It may be affected by motivation, fatigue, or intervening events.
57
What are the factors that may affect parallel form/alternate form reliability test scores?
It may be affected by motivation, fatigue, or intervening events.
58
What are the statistical tools for parallel form/alternate form reliability?
Pearson R or Spearman Rho
59
What are the statistical tools for parallel form/alternate form reliability?
Pearson R or Spearman Rho
60
What is Internal Consistency also known as?
Inter-Item Reliability
61
What is an error of Internal Consistency?
Item Sampling Homogeneity
62
Internal Consistency (Inter-Item Reliability)
+ used when tests are administered once + consistency among items within the test + measures the internal consistency of the test which is the degree to which each item measures the same construct + measurement for unstable traits
63
When can a test be said to have good internal consistency?
This is if all items measure the same construct, then it has a good internal consistency
64
What is internal consistency most useful for?
useful in assessing Homogeneity
65
Homogeneity
if a test contains items that measure a single trait (unifactorial)
66
Heterogeneity
degree to which a test measures different factors (more than one factor/trait)
67
When will a test have higher inter-item consistency?
more homogenous items = higher inter-item consistency
68
What are the different statistical tools that may be used for computing Internal Consistency?
+ KR-20 + KR-21 + Cronbach's Coefficient Alpha
69
KR-20
used for inter-item consistency of dichotomous items (intelligence tests, personality tests with yes or no options, multiple choice), unequal variances, dichotomous scored
70
KR-21
used if all the items have the same degree of difficulty (speed tests), equal variances, dichotomous scored
71
Cronbach's Coefficient Alpha
used when two halves of the test have unequal variances and on tests containing non-dichotomous items; unequal variances
72
Average Proportional Distsance
measure used to evaluate internal consistencies of a test that focuses on the degree of differences that exists between item scores
73
What is an error of Split-Half Reliability?
Item Sample; Nature of Split
74
Split-Half Reliability
obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered ONCE
75
What is split-half reliability useful for?
it is useful when it is impractical or undesirable to assess reliability with two tests or to administer a test twice
76
How can split-half reliability be done?
One cannot just divide the items in the middle because it might spuriously raise or lower the reliability coefficient, so just randomly assign items or assign odd-numbered items to one half and even-numbered items to the other half
77
What are the different statistical formulas that may be used for computing Split-Half Reliability?
+ Spearman-Brown Formula + Spearman-Brown Prophecy Formula + Rulon's Formula
78
Spearman-Brown Formula
allows a test developer of user to estimate internal consistency reliability from a correlation of two halves of a test, if each half had been the length of the whole test and have the equal variances
79
Spearman-Brown Prophecy Formula
estimates how many more items are needed in order to achieve the target reliability
80
How is Spearman-Brown Prophecy Formula computed?
multiply the estimate to the original number of items
81
Rulon's Formula
counterpart of spearman-brown formula, which is the ration of the variance of difference between the odd and even splits and the variance of the total, combined odd-even, score
82
What should the developer do if the split-half reliability is relatively low?
If the reliability of the original test is relatively low, then developer could create new items, clarify test instructions, or simplifying the scoring rules
83
What are the statistical tools that may be used to compute split-half reliability?
Pearson R or Spearman Rho
84
What is the error of Inter-Scorer Reliability?
Scorer Differences
85
Inter-Scorer Reliability
+ the degree of agreement or consistency between two or more scorers with regard to a particular measure + evaluated by calculating the percentage of times that two individuals assign the same scores to the performance of the examinees
86
Variation of Inter-Scorer Reliability
a variation is to have two different examiners test the same client using the same test and then to determine how close their scores or ratings of the person are
87
What is Inter-Scorer Reliability most used for?
used for coding nonbehavioral behavior/factors
88
What are statistical measures that may be used for Inter-Scorer Reliability?
+ Fleiss Kappa + Cohen's Kappa + Krippendorff's Alpha
89
Fleiss Kappa
determine the level between TWO or MORE raters when the method of assessment is measured on CATEGORICAL SCALE
90
Cohen's Kappa
two raters only
91
Krippendorff's Alpha
two or more rater, based on observed disagreement corrected for disagreement expected by chance
92
Dynamic
trait, state, or ability presumed to be ever-changing as a function of situational and cognitive experience
93
Static
barely changing or relatively unchanging
94
Restriction of Range or Restriction of Variance
if the variance of either variable in a correlational analysis is restricted by the sampling procedure used, then the resulting correlation coefficient tends to be lower
95
Power Tests
when time limit is long enough to allow test takers to attempt all times
96
Speed Tests
generally contains items of uniform level of difficulty with time limit
97
What kind of reliability should be used for speed tests?
Reliability should be based on performance from two independent testing periods using test-retest and alternate-forms or split-half-reliability
98
Criterion-Referenced Tests
designed to provide an indication of where a testtaker stands with respect to some variable or criterion
99
What will happen to the traditional measure of reliability when individual differences decerase?
As individual differences decrease, a traditional measure of reliability would also decrease, regardless of the stability of individual performance
100
Classical Test Theory
+ states that everyone has a "true score" on a test + made up of "true score" and random error
101
True Score
genuinely reflects an individual's ability level as measured by a particular test
102
Domain Sampling Theory
+ estimates the extent to which specific sources of variation under defined conditions are contributing to the test scores + considers problem created by using a limited number of items to represent a larger and more complicated construct + test reliability is conceived of as an objective measure of how precisely the test score assesses the domain from which the test draws a sample + Systematic Error
103
Generalizability Theory | Domain Sampling Theory
+ based on the idea that a person's test scores vary from testing to testing because of the variables in the testing situations + according to generalizability theory, given the exact same conditions of all the facets in the universe, the exact same test score should be obtained (universe score)
104
Universe
the test situation
105
Facet
number of items in the test, amount of review, and the purpose of test administration
106
Decision Study
developers examine the usefulness of test scores in helping the test user make decisions
107
Item Response Theory
+ the probability that a person with X ability will be able to perform at a level of Y in a test + a system of assumption about measurement and the extent to which item measures the trait
108
What is the focus of Item Response Theory?
item difficulty
109
What is Item Response Theory also known as?
Latent-Trait Theory
110
Computer using IRT
+ The computer is used to focus on the range of item difficulty that helps assess an individual's ability level + If you got several easy items correct, the computer will then move to more difficult items
111
Difficulty
attribute of not being easily accomplished, solved, or comprehended
112
Discrimination
degree to which an item differentiates among people with higher or lower levels of the trait, ability, etc.
113
Dichotomous
can be answered with only one of two alternative responses
114
Polytomous
3 or more alternative responses
115
Standard Error of Measurement
+ provide a measure of the precision of an observed test score + index of the amount of inconsistent or the amount of the expected error in an individual's score + allows to quantify the extent to which a test provide accurate scores + used to estimate or infer the extent to which an observed score deviates from a true score + Standard Error of a Score
116
What is the basic measure of error (SEM)?
Standard deviation of error
117
What does the SEM provide?
provides an estimate of the amount of error inherent in an observed score or measurement
118
What does it mean when a test has lower SEM?
Higher reliability
119
What is SEM used for?
Used to estimate or infer the extent to which an observed score deviates from a true score
120
Confidence Interval | Standard Error of Measurement
+ a range or band of test scores that is likely to contain true scores + tells us the relative ability of the true score within the specified range and confidence level
121
What does it mean when the range is larger?
The larger the range, the higher the confidence
122
Standard Error of the Difference | Standard Error of Measurement
can aid a test user in determining how large a difference should be before it is considered statistically significant
123
Standard Error of Estimate | Standard Error of Measurement
refers to the standard error of the difference between the predicted and observed values
124
What can one do if the reliability is low?
If the reliability is low, you can increase the number of items or use factor analysis and item analysis to increase internal consistency
125
Reliability Estimates
nature of the test will often determine the reliability metric
126
Types of Reliability Estimates
a) Homogenous (unifactor) or heterogeneous (multifactor) b) Dynamic (unstable) or static (stable) c) Range of scores is restricted or not d) Speed Test or Power Test e) Criterion or non-Criterion
127
Test Sensitivity
detects true positive
128
Test Specificity
detects true negative
129
Base Rate
proportion of the population that actually possess the characteristic of interest
130
Selection ratio
no. of hired candidates compared to the no. of applicants
131
Formula for Selection Ratio
number of hired candidates / total number of candidates / = divided by
132
Four Possible Hit and Miss Outcomes
1. True Positives (Sensitivity) 2. True Negatives (Specificity) 3. False Positive (Type 1) 4. False Negative (Type 2)
133
True Positives (Sensitivity)
predict success that does occur
134
True Negatives (Specificity)
predict failure that does occur
135
False Positive (Type 1)
success does not occur
136
False Negative (Type 2)
predicted failure but succeed
137
Quartile 1
scored well, performed poorly
138
Quartile 2
scored well, performed well
139
Quartile 3
scored well, performed poorly
140
Quartile 4
scored poorly, performed poorly