Chapter 5: Reliability Flashcards

1
Q

this term refers to the consistency of a
measurement

It indicates whether a test yields
stable and consistent results over time and across different contexts

A

reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is reliability determined by?

A

Reliability is determined by the proportion of total variance in test scores that can be attributed to true variance (i.e., actual differences in the ability being measured). The higher this proportion, the more reliable the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

this type of variance represents actual
differences in ability.

For example, differences in math skills among friends taking a test reflect ______ ______

A

true variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

this term refers to an index that quantifies reliability, expressed as the ratio of true score variance to total variance.

A

reliability coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

this type of variance represents variability
due to irrelevant or random factors, such as distractions or fatigue, affecting scores even if the true ability is constant

A

Error Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

this term encompasses all factors
affecting the measurement process that are not related to the variable being assessed.

A

Measurement Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

this type of measurement error is caused by unpredictable fluctuations, resulting in inconsistencies without a discernible pattern

This type of error can lead to score variability without biasing the results
systematically.

A

Random Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

this type of measurement error refers to a consistent error that may affect scores in a predictable manner, making it possible to identify and correct

This type of error does not compromise the consistency of scores

A

Systematic Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

this source of error variance is seen during test construction and arises from differences among test items, both within a single test and across different test

A

Item Sampling/Content Sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are the three main sources of error variance?

A
  1. Test Construction
  2. Test Administration
  3. Test Scoring and Interpretation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

this source of error variance refers to the conditions under which the test is administered can impact performance.

A

Test Environment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

this source of error variance refers to the subjectivity in scoring, technical issues,
or glitches can lead to inconsistent
results.

A

Scorers and Scoring Systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

this source of error variance refers to factors such as emotional distress, physical discomfort, lack of sleep, or the influence of drugs and medication can affect the test-taker’s performance and focus.

A

Test-Taker Variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

this source of error variance refers to the physical appearance and demeanor of
the examiner, their presence or absence, and their level of professionalism can influence the test-taker’s experience.

A

Examiner-Related Variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

this source of error variance refers to the extent to which the sample population accurately represents the broader population.

Discrepancies in demographics, political affiliation, or other relevant factors can lead to biased results.

A

Sampling Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

this type of reliability estimate measures the consistency of results between different versions of a test designed to assess the same construct

it assesses how well two different forms of a test yield similar scores when administered to the same individuals.

A

Parallel-Forms and Alternate-Forms Reliability Estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

this source of error refers to issues such as ambiguous wording in questionnaires or biased items can skew results, favoring one response or candidate over another

A

Methodological Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

this type of reliability estimate is defined as a method for estimating the
reliability of a measuring instrument by
administering the same test to the same group at two different points in time.

A

Test-Retest Reliability Estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

when do we apply test-retest reliability estimate?

A

it is best used for measuring stable
traits (e.g., personality) rather than fluctuating characteristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

this term refers to the different versions designed to be equivalent but may not meet the strict criteria of parallel forms

A

Alternate Forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

this term refers to the two forms of a test are statistically equivalent in terms of means and variances

A

Parallel Forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the index for parallel and alternate-forms reliability?

A

coefficient of equivalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the method for test-retest reliability?

A

correlate scores from the same individuals across two test administrations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is the index for test-retest reliability?

A

coefficient of stability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the reliability estimates categorized as external consistency estimates?

A

test-retest reliability
parallel-forms and alternate-forms reliability estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what are the reliability estimates categorized as internal consistency estimates?

A

internal consistency reliability
split-half reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

this type of reliability can be assessed without creating alternate forms or re-administering tests

This involves evaluating the consistency of test items.

A

Internal Consistency Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

this type of reliability estimate estimates reliability by correlating scores from two equivalent halves of a single test.

A

Split-Half Reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are the acceptable splitting methods for split-half reliabilty?

A
  1. Randomly assigning items to
    halves
  2. Using odd and even item
    assignments (odd-even
    reliability).
  3. Dividing by content to ensure
    both halves measure equivalent
    constructs
17
Q

what are the steps in conducting split-half reliability?

A
  1. Divide the test into two equivalent halves.
  2. Calculate the Pearson correlation between the two halves
  3. Adjust the reliability using the Spearman-Brown formula.
17
Q

this type of measurement error refers to unpredictable fluctuations that introduce variability without a systematic
pattern.

A

Random Error

17
Q

this term refers to all factors affecting the measurement of a variable that are not related to the variable itself

A

Measurement Error

17
Q

this type of measurement error refers to consistent errors that affect scores predictably, allowing for correction once identified.

A

Systematic Error

17
Q

what are the methods for estimating internal consistency reliability?

A
  1. KR-20
  2. KR-21
  3. Cronbach’s Alpha
18
Q

when is KR20 appropriate?

A

when it is a true dichotomous test (true or false)

18
Q

when is KR21 appropriate?

A

when it is an artificial dichotomy (multiple choice, binary scoring)

18
Q

this reliability estimate measures the degree of difference between item scores rather than similarity

is calculated based on absolute differences between item scores and is less affected by the number of items on a test

A

Average Proportional Distance

18
Q

when is Cronbach’s alpha appropriate?

A

when it is a true multiple choice test (Likert Scale)

18
Q

this reliability estimate measures the correlation among all items on a scale from a single test administration,
assessing the homogeneity of the test

A

Inter-Item consistency

18
Q

when is inter-scorer reliability used?

A

Frequently used in coding nonverbal behavior.

For example, a researcher may create a checklist of behaviors (like looking downward or moving slowly) to quantify
aspects of nonverbal cues indicating depressed mood.

18
Q

his term refers to the degree of agreement or consistency between two or more scorers regarding a particular measure

A

Inter-Scorer Reliability

18
Q

how is inter-scorer reliability calculated?

A

it is calculated using a correlation coefficient, referred to as the coefficient of inter-scorer reliability

18
Q

what number should the coefficient be for it to be considered as highly reliable?

Considered excellent (grade A); crucial for high-stakes decisions.

19
Q

what number should the coefficient be for it to be considered as moderately reliable?

19
Q

what number should the coefficient be for it to be considered as having a low reliability?

Weak, indicating potential issues with the
test’s effectiveness

A

0.65-0.70s

19
Q

what number should the coefficient be for it to be considered as having an unacceptable reliability?

A

below 0.50

19
Q

when is a test considered homogenous?

A

A test is considered homogeneous if it is
functionally uniform, measuring a single
factor (e.g., one ability or trait).

19
Q

these tests are designed to indicate how a test-taker performs relative to a specific
criterion or standard (e.g., educational or
vocational objectives).

these tests focus on measuring whether test-takers meet predetermined criteria rather than comparing their scores to those of others

A

criterion-referenced tests

19
Q

when is a test considered heterogeneous?

A

A test is heterogeneous if it measures
multiple factors or traits.

In such cases, internal consistency
estimates may be lower, whereas test-retest reliability might provide a more appropriate measure of reliability

19
Q

what reliability estimates are appropriate for static characteristics?

A

test-retest or alternate-forms methods

19
Q

these tests consist of items of uniform difficulty (typically low), allowing all test-takers to complete all items correctly within generous time limits

A

Speed Tests

19
Q

these tests are administered time limit that allows test-takers to attempt all items

Contains difficult items, with the
expectation that no test-taker can
achieve a perfect score

A

Power Tests

20
Q

this theory is also known as the true score model

is the most widely used model of measurement in psychology due to its simplicity relative to more complex models

A

Classical Test Theory

21
Q

this term represents the value that genuinely reflects an individual’s ability or trait level as measured by a particular test.

This value is highly dependent on the specific test used

A

True Score

21
Q

this theory posits that a test’s reliability is
determined by how accurately the test score reflects the domain from which it samples

A

Domain Sampling Theory

22
Q

this theory suggests that test scores can vary across different testing situations due to various situational factors

A

Generalizability Theory

23
Q

this term refers to the complete
range of items that could measure a specific behavior, viewed as a hypothetical construct.

A

Domain of Behavior

24
Q

this term assesses how well
scores from a specific test can be generalized across different contexts.

Coefficients of generalizability represent the influence of particular facets on test scores

A

Generalizability Study

25
Q

this term evaluates the utility of test
scores in assisting users in making informed decisions.

A

Decision Study

26
Q

this theory is an alternative to CTT that models the probability of an individual with a certain level of ability performing at a specific level

Often referred to as latent-trait theory because it measures constructs that are not directly observable (latent).

A

Item Response Theory (IRT)

26
Q

what does discrimination mean for IRT?

A

discrimination measures the extent to which an item can differentiate between individuals with higher or lower levels of the trait or ability being assessed.

27
Q

what are the key concepts in IRT?

A

Difficulty and Discrimination

28
Q

what does difficulty mean for IRT?

A

The attribute of an item indicating
how challenging it is to accomplish, solve, or comprehend.

29
Q

what are the two types of test items?

A
  1. Dichotomous Test Items
  2. Polytomous Test Items
30
Q

this type of test items have two possible responses, such as true/false or yes/no.

A

this type of test items have three e or
more possible responses, where only one
response is correct or aligned with the targeted trait or construct

31
Q

this term refers to the range or
band of test scores that is likely
to contain the true score

A

confidence interval

32
Q

this statistical tool is used to estimate or infer the extent to which an observed score deviates from a true score

A

standard error of measurement

33
Q

this statistical measure can aid a test
user in determining how large a difference should be before it is
considered statistically significant

A

standard error of the difference

34
Q

how does restricted variance influence correlation coefficients?

A

it leads to lower correlation coefficients since the diversity of scores is limited

35
Q

how does inflated variance influence correlation coefficients?

A

it results in higher correlation coefficients due to a broader spread of scores.