Al psychometrics notes Flashcards

1
Q

What are the 2 processes that are part of test standardization?

A
  1. Uniform administration and scoring procedures

2. development of test norms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does reliability refer to?

A

A test’s consistency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does reliability provide no information on?

A

What is being measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does classical test theory propound?

A

That an obtained test score (X) is composed of two additive and independent components:

  1. True score (T): actual status on the attribute
  2. Error (E): random
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the ideal (but unobtainable) formula for reliability?

A

True variance/observed variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do reliability estimates assume?

A
  1. Variability that is consistent is true variance

2. Variability that is inconsistent is random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the range of a reliability coefficient?

A

0.0-1.0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does a reliability coefficient of 0.0 indicate?

A

That all variability obtained in a test’s scores is attributable to measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does a reliability coefficient of 1.0 indicate?

A

That all variability obtained in a test’s scores reflects true score variability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between the reliability coefficient and other correlation coefficients?

A

It is never squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the reliability coefficient estimate?

A

The proportion of variability in obtained test scores that reflects true scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 5 main types of reliability?

A
  1. Test-retest reliability
  2. Alternate-forms reliability
  3. Split-half reliability
  4. Coefficient Alpha
  5. Inter-rater reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is test-retest reliability?

A

The test is given to the same group twice, and then the two sets of scores are correlated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the coefficient given from test-retest reliability?

A

A coefficient of stability (tests the degree of stability over time)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the source of measurement error in test-retest reliability?

A

Time sample error

random factors between the two test administrations: examinees fluctuations (e.g., anxiety) etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What kind of tests is test-retest reliability most suitable for?

A

Aptitude tests - a stable characteristic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What kind of tests is test-retest reliability least suitable for?

A

Tests of mood - fluctuates over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What do you do in alternate-forms reliability? What does it indicate?

A

2 equivalent tests are administered to the same group, and then the two sets of scores are correlated

It indicates the consistency of responding to different item samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the coefficient derived from alternate forms reliability?

A

Coefficient of equivalence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In alternate-forms reliability, when the forms are administered at different times the test also measures consistency over time - what is the reliability coefficient derived?

A

Coefficient of equivalence and stability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What kind of error is associated with alternate-forms reliability?

A

Content sampling

The interaction between different examinee’s knowledge and the different content assessed by the items in the forms. e.g., Form A matches one examinee’s knowledge better than Form B

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Alternate-form reliability is a rigorous form of reliability, but what is the problem with it?

A

It is difficult to develop truly equivalent forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When is alternate-form reliability inappropriate?

A

When the attribute is likely to fluctuate over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In what two way are split-half reliability and coefficient alpha similar?

A
  1. Both involve administering a test once to a single group

2. Both yield a reliability coefficient called a “coefficient of internal consistency”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How is split-half reliability conducted?

A

The test is split into halves so that each examinee has two scores. The scores on the two halves are then correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is a problem with split half reliability?

A

It yields a coefficient derived from 1/2 test length (remember the reliability decreases as the length of a test decreases).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

A problem with split-half reliability is that it is derived from only half a test, what does this mean?

A

It underestimates true reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Split-half reliability underestimates true reliability, how is this corrected?

A

using the Spearman-Brown prophecy formula

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the full name for the coefficient alpha?

A

Cronbach’s coefficient alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How is the Cronbach’s coefficient alpha derived?

A

The test is administered to one group of examinees at a single time point. The formula then determines average inter-item consistency. Average reliability is then obtained from all split tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

The coefficient alpha is conservative, and consequently can be considered a X of the test’s reliability

A

lower bound estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Split-half reliability and coefficient alpha are both measures of what?

A

Internal consistency

33
Q

What is an error source for internal consistency?

A

Content sampling

34
Q

How does split-half reliability contain content sampling error?

A

Because of differences between the content of the test halves - items in one half better fit the knowledge of some than items in the other half

35
Q

How does coefficient alpha contain content sampling error?

A

Because of differences between individual test items

36
Q

What is the specific term for the type of content sampling error found in the coefficient alpha?

A

heterogeneity of the content domain

i.e., the greater the heterogeneity of the content, the lower the inter-item correlations and the lower the coefficient alpha

37
Q

When is inter-rater reliability used?

A

In situations where the test scores involve a rater’s judgement (e.g., essay tests or projective tests)

38
Q

What is the method for determining inter-rater reliability?

A

Correlation coefficient (e.g., a kappa coefficient) or determine % agreement between raters

39
Q

What is wrong with the method of determining the % agreement between raters in inter-rater reliability?

A

Leads to erroneous conclusions because it doesn’t take into account the level of chance agreement. This is especially high when the behavior has a high rate of occurrence.

40
Q

How does Cohen’s kappa adjust % agreement?

A

By removing the effects of chance

41
Q

What is the term for a type of error that artificially inflates a measure of inter-rater reliability?

A

consensual observer drift

42
Q

Consensual observer drift is one kind of error associated with inter-rater reliability - what is another?

A

Rater factors, such as: lack of motivation or rate biases

43
Q

How can observer drift be eliminated?

A

Raters working independently

44
Q

When raters are told ratings are checked what happens?

A

Accuracy improves

45
Q

What three factors effect the reliability coefficient?

A
  1. test length (longer better)
  2. range of test scores (larger the range, the higher the cc)
  3. guessing
46
Q

How does guessing effect the reliability coefficient?

A

As the probability of correctly guessing answers increases, the reliability coefficient decreases

47
Q

When is a test reliable?

A

When it has small measurement error

48
Q

Does the WAIS-IV have good reliability?

A

Yes - both internal (split-half) and temporal

49
Q

What does SEM index?

A

SEM indexes the amount of error that can be expected in obtained scores due to test unreliability

Also: the SD of all scores averaged across persons with infinite test administrations is the SEM

50
Q

How do you calculate a 99% CI

A

SEM*2.58

51
Q

How do you calculate a 68% CI?

A

SEM*1

52
Q

What is test validity?

A

The extent to which a test measures what it is supposed to measure. How successful a test is for its intended use.

53
Q

What are the 3 main types of validity?

A
  1. Content validity
  2. Construct validity
  3. Criterion-related validity
54
Q

What is content validity?

A

Familiarity with a particular content or behavior domain (e.g., our test)

Used in achievement tests

55
Q

What is construct validity?

A

The extent to which an examinee possess a particular hypothetical trait (e.g., aggressiveness or intelligence)

Validity in measurement of a hypothetical construct that cannot be measured directly

56
Q

What is criterion-related validity?

A

The extent to which a measure is related to an outcome (e.g., GRE scores to graduate grades)

57
Q

What results in poor content validity?

A

inadequate sampling

58
Q

What three things indicate adequate content validity?

A
  1. Strong internal consistency
  2. Correlations with other tests of the same domain
  3. Sensitivity to manipulations that increase the familiarity with the domain
59
Q

What is contruct validity trying to do?

A

provide evidence that the test measures the construct that it is supposed to measure

60
Q

Construct validity may entail 5 different things, what are they?

A
  1. Internal consistency
  2. Group differences
  3. Research on the construct
  4. Convergent and discriminant validity
  5. Factorial validity
61
Q

What is internal consistency?

A

Are all measures measuring the same thing

62
Q

How can group differences be used to determine construct validity

A

Do scores distinguish between people with different levels of the construct?

63
Q

How can research on the construct be used to determine construct validity?

A

Do test scores change, following construct manipulation as predicted by the theory

64
Q

How can convergent and discriminant validity be used to determine construct validity?

A

You should find a high CC with measures of the same trait and a low CC with measures of unrelated traits

65
Q

How can factorial validity be used to determine construct validity?

A

Does the test have the predicted factorial composition

66
Q

What is the multitrait-multimethod matrix?

A

An approach to examining construct validity. It organizes convergent and discriminant validity evidence for comparison of how a measure relates to other measures.

67
Q

monotrait-heteromethod coefficients

A

Indicate the correlation between different measures of the same trait

(when large provide evidence of convergent validity

68
Q

heterotrait-monomethod coefficients

A

show the correlation between different traits measured by the same measure

(When small this indicates that the test has discriminant validity)

69
Q

heterotrait-heteomethod coefficients

A

correlation between different traits measured by different measures

(provide evidence of discriminant validity when small)

70
Q

What is factor analysis used for?

A

Evaluating construct validity

71
Q

Criterion-related validity

A

Measures how well one measure predicts an outcome for another measure

Assessed by correlating the scores of a sample of individuals on the predictor with their status on the criterion

72
Q

When the criterion-related validity coefficient is large…

A

this confirms that the predictor has criterion-related validity

73
Q

What are the two types of criterion-related validity?

A
  1. Predictive validity

2. Concurrent validity

74
Q

Predictive validity is a subtype of criterion-related validity - describe it.

A

It is testing to predict future performance on the criterion

75
Q

Concurrent validity is a subtype of criterion-related validity - describe it.

A

Estimates the current status on the criterion

76
Q

What kinds of validity are associated with incremental validity?

A

concurrent and predictive

77
Q

formula for incremental validity?

A

incremental validity = positive hit rate - base rate

78
Q

formula for positive hit rate?

A

true positives/total positives