Reliability Flashcards

1
Q

OBSERVED SCORE

A

The actual score the test faker received on the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

TRUE SCORE

A

The true 100% accurate reflection of the test takers ability, skills, or knowledge (their score if the test/assessment was perfect and without error).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The greater the amount of measurement error on test scores, the _______ the reliability/precision

A

Lower

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

MEASUREMENT ERROR

A

Any FLUCTUATION in scores that results from factors related to the measurement process, but not related to what is being measured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

RELIABILITY/PRECISION

A

The degree that a measurement’s test scores are dependable, consistent, and stable across different forms of the test, items of the test, and repeat administrations of the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is it imperative to make sure test scores are reliable?

A

Because many important decisions are made about individuals based on test scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Regarding reliability/precision, what is your job as a professional counselor?

A

To INTERPRET the reliability and DETERMINE the acceptable degree of reliability for the assessment being utilized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the formula for Observed Score?

A

Observed Score = True Score + Measurement Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the sources of measurement error?

A
Time sampling error
Content-sampling error
Interrater differences 
Quality of test items
Test length
Test-taker variables
Test administration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

TIME SAMPLING ERROR

A

Fluctuation in test scores obtained from repeated testing of the same individual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the CARRYOVER EFFECT in a time-sampling error

A

If the interval is too short between tests, where the first test taking session influences the second test taking session, for example test takers may remember their answers from the first test administration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the PRACTICE EFFECT in a time-sampling error?

A

When a test-taker’s skills have improved by having taken the test the first time (some skills improve with practice).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What two issues are involved in time-sampling error when the interval between tests is too short?

A

Carryover effect

Practice effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What issues are involved in time-sample error when the interval between tests is too long?

A

LEARNING, MATURATION (i.e., changes in the the test-takers themselves that occur over time), or other INTERVENING EXPERIENCES (i.e. treatment).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What the assumption about constructs in time-sample error?

A

Constructs may vacillate over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What types of constructs are not as prone to time-sampling errors?

A

Personality traits and abilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What types of constructs are prone to time-sampling errors?

A

Emotional states (depression, anxiety) and achievement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

CONTENT SAMPLING ERROR

A

An instrument that does not include items that adequately represent the content domain OR an error that results from selecting test items that do not adequately cover the content the area the test is supposed to evaluate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is considered teh largest source of error in instrument scores?

A

Content-sampling error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

INTERRATER DIFFERENCES

A

This happens when the instrument scores rely heavily upon the subjective judgement of raters. Most likely, different raters will not always assign the exact scores or rating to a given test performance, even if the scoring directions specified and the test manuals are explicit and the raters are conscientious.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When referring to sources of measurement error, what does QUALITY TEST OF ITEMS refer to?

A

How well the test items are constructed (clear and focused vs. vague and ambiguous).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What does TEST LENGTH refer to when thinking of sources of measurement errors?

A

As the number of items on a test increases, the more accurately the test represents the content domain being measured. The greater the number of items, the greater the reliability/precision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What TEST-TAKER VARIABLES can be sources of error variance in reliability/precision?

A

A test-taker’s motivation, fatigue, illness, physical discomfort, or mood can all affect the test-taker’s performance on a test and affect the reliability/precision of an assessment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How is TEST ADMINISTRATION a source of measurement error?

A

things like an examiner not following specified administration instructions, room temperature, lighting, noise, and critical incidents during test administration can cause measurement error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are the major methods of estimating reliability?

A
  • TEST-RETEST
  • ALTERNATE FORMS (simultaneous administration, delayed administration)
  • INTERNAL CONSISTENCY (Split-Half, KR Formulas, Coefficient Alpha)
  • INTERRATER
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a RELIABILITY COEFFICIENT?

A

1) The percentage of accuracy that the test measures the real differences among test takers, not random error.
2) Reliability coefficient always pertains to a GROUP of test scores, not individual scores.
3) The methods most often used to estimate reliability/precision use the reliability coefficent (r).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

The closer reliability coefficients are to zero, the more that test scores represent _____ ______, not ______ _______ __________.

A

random error, test-taker performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the formula to determine the percentage of error?

A

Error = 1 - r (reliabiilty coefficient)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

If the reliability coefficient is 0.85, what is the error?

A

1 - 0.85 = .15 Answer: the error is 15%. Meaning 15% is attributed to random error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the oldest and commonly used method to estimate reliability/precision?

A

Test-retest method.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the test-retest method?

A

The same test given twice with time interval between testings.

32
Q

What is the term for correlating the scores of tests administered on two separate occasions and what does it reflect?

A

It is called the coefficient of stability.

It reflects the stability of the test scores over time.

33
Q

Because test-retest reliability estimates error related to time sampling, the _____ ______ between the two test administrations must be specified since it will affect the _____ of the test scores.

A

time interval; stability

34
Q

When is the test-retest method most useful?

A

When measuring variables (constructs) that do not change over time, such as traits, abilities, and characteristics.

35
Q

When is the test-retest method inappropriate

A

When measuring variables (constructs) that are transient and constantly changing, such as someone’s mood.

36
Q

What are the two things to consider when test-retest reliability in an instrument manual?

A

1) The length of the time interval between test administrations.
2) The type of construct (variable) being tested.

37
Q

What is another name for ALTERNATE FORMS reliability?

A

Parallell reliability

38
Q

what does ALTERNATE FORMS RELIABILITY determine?

A

It helps us determine if two equivalent forms of the same test are really equivalent. In other words, it tests if different tests with similar items measure teh same content, knowledge, or skill.

39
Q

In alternate forms reliability, the tests must be different, but cover the same content domain. What must the two tests also contain?

A

The two tests should

  • have the same number of items,
  • use the same type of format,
  • same directions for administering, scoring, and interpreting the test.
40
Q

What are the two procedures for establishing alternate forms reliability?

A

Simultaneous administration

Delayed administration

41
Q

What is the procedure for simultaneous administration in alternative forms reliability testing?

A

The two forms of the test are given simultaneously to the same group of people on the same day.

42
Q

What is the procedure for delayed administration in alternative forms reliability testing?

A

Giving the two forms of the test on two different occasions.

43
Q

What is the procedure for delayed administration in alternative forms reliability testing?

A

Giving the two forms of the test on two different occasions.

44
Q

What coefficient does alternate forms reliability based on simultaneous administration provide?

A

It provides a coefficient of equivalence because simultaneous administration detects errors related to content sampling.

45
Q

What coefficients do alternate forms reliability based on delayed administration provide?

A

Delayed administration provides a coefficient of equivalence and a coefficient of stability because it detects errors related to content and time-sampling.

46
Q

Why isn’t alternative forms testing used very often or limited?

A

Because very few tests have alternate forms. The process of developing an equivalent test that mirrors the other, but is different is time consuming so most test developers do not pursue this option.

47
Q

What is INTERNAL CONSISTENCY RELIABILITY evaluate?

A

the interrelatedness of items within an instrument/the extent to which the items on the test measure the same ability or trait. .

48
Q

What does high internal consistency mean?

A

the test items are homogenous, which increases confidence that items assess a single construct.

49
Q

Why are internal consistency estimates appealing to test designers?

A

They require only a single test and a single test administration to gather the initial psychometric property.

50
Q

Why are internal consistency estimates appealing to test designers/publishers?

A

They require only a single test and a single test administration to gather the initial psychometric property.

51
Q

What are the three typical ways of computing internal consistency reliability coefficients?

A
  • Split-Half Reliability
  • Kuder-Richardson Formulas
  • Coefficient Alpha
52
Q

What is the procedure for split-half reliability?

A

A test is divided into two comparable halves and both halves are given during one testing session. the results on one half of the test are then correlated with the results on the other half of the test.

53
Q

What is the resulting coefficient in split-half reliablity?

A

Coefficient of equivalence. It detects errors related in content sampling. It’s number states how well the items on the test consistently measure the same construct.

54
Q

What is the Spearman-Brown prophecy formula

A

A test is split into two halves and this formula is used to calculate the coefficient. It provides an estimate of what the coefficient would be if each half had been the length of the whole test.

55
Q

what are other names for the Kuder-Richardson formulas?

A

KR20 and KR21

56
Q

When are the KR 20 and KR 21 calculations used?

A

Used in tests that have dichotamous items (answers that are right or wrong (with 0 indicating an incorrect answer and 1 indicating a correct answer). Reliability can be assessed without splitting the test in half.

57
Q

What is the coefficient alpha?

A

Used when items in the test are not dichotamous. It is equal to the KR 20.

58
Q

What is the other name for coefficient alpha?

A

Chronbach’s alpha

59
Q

What is more commonly used, KR20 or Chronbach’s alpha? Why?

A

Chronbach’s alpha because the results are jsut as good as KR 20 and it is not limited to dichotamous tests.

60
Q

What measure has become primary in assessing reiliability/precision?

A

Chronbach’s alpha.

61
Q

What is INTERRATER RELIABILITY?

A

the extent to which two or more raters agree.

62
Q

What is the basic method for assessing level of agreement between two or more observers?

A

Correlating the scores obtained independently by two or more raters.

63
Q

What does interrater reliability reflect? What does it not reflect

A

Reflects - interrater agreement

Does not reflect - content sampling error or time-sampling error.

64
Q

If a test is designed to be given more than one time, which measurement of error would be chosen and why?

A

Test-retest or alternate forms reliability with delayed administration because both are sensitive to time-sampling errors.

65
Q

If a test involves two or more raters what source of measurement error would be chosen?

A

Interrater reliability.

66
Q

What methods would be used to compute internal consistency reliability in a test with heterogenous content?

A

1) Split half method, The test would be divided into two equivalent halves, each one consisting of constructs A and B, and correlate the halves.
2) KR 20 and coefficient alpha may be used if the differing constructs are placed in homogenous subgroups. Each subgroup would be tested with KR 20 or coefficient alpha to calculate the reliabiilty/precision of internal consistency.

67
Q

What are the coefficient reliability scores that are very high, high, acceptable, mdoerate/acceptable, and low/unacceptable.

A
There is no set threshold, but according to Sheperis et al. (2020) the following are the thresholds:
(A) >.90  Very high
(B) .80-.89 High
(C) .70-.79 Acceptable
(D) .60-.69 Moderate/Acceptable
(F)
68
Q

What is the standard error of measurement (SEM) used for?

A

It is a simple measure of an individual’s test score fluctuations (due to test error) if they took the test repeatedly. It is an estimation of the accuracy of an individual’s observed score related to the true score. had the individual been tested infinite times.

69
Q

what is the standard error of measurement (SEM)

A

SEM is the measure of the spread of scores obtained by a SINGLE INDIVIDUAL if the individual was tested multiple times.

70
Q

What is the difference between the SEM and standard deviation?

A

Standard deviation is the spread of scores obtaine by a GROUP OF TEST TAKERS on a SINGLE TEST. SEM the measure of the spread of scores obtained by a SINGLE INDIVIDUAL if the individual was tested MULTIPLE times.

71
Q

What is the SEM used for?

A

The SEM is used to create confidence intervals around specific observed scores, which can guide score interpretations.

72
Q

SEM formula

A

write it out and check on page 144 of textbook

73
Q

What does a confidence interval tell us?

A

The upper and lower limit within which a person’s true score will fall.

74
Q

In assessment, what confidence intervals are we interested in?

A

68%, 95%, and 99.5%

75
Q

What z scores are associated with the confidence levels?

A

68% is associated with a z score of 1.00
95% is associated with a z score of 2.96
99.5% is associated with a z score of 2.58

76
Q

Compute the confidence intervals for an individual’s score of 100 and the test has a SEM of 3.67.

A

See page 145 of the textbook
Answers: 68% probability that the individual’s true test score fall between 96 - 104
95% probability that the individual’s true test score fall between 94-106
99.5% probability that the individual’s true test score fall between 94 - 106

77
Q

How do researchers decrease the error and increase the reliability/precision of test scores?

A
  1. Increase the number of items on the test (reduces content-sampling error)
  2. Writing understandable, unambiguous test items.
  3. Use selected response items (multiple choice) rather than constructed-response items (essays)
  4. Make sure items are not too difficult or too easy.
  5. Have clearly stated administration and scoring procedures.
  6. Require training before individuals and administer, grade or interpret a test.