Lecture 2 Flashcards

1
Q

What are reliability and validity in tests?

A

Reliability: The test measures one and only one thing (precision)
Validity: The test measures what it’s supposed to measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the test standards?

A

Recommendations for using and interpreting test scores, developed and distributed by:

  • American Psychological Association (APA)
  • American Educational Research Association (AERA)
  • National Council on Measurement in Education (NCME)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Test Standards definition of “validity”?

A

The degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why are the Test Standards important?

A

They act as a framework
♣ Represent current consensus (therefore current operational guidelines)
♣ Alternative viewpoints, arguments, propositions
♣ Psychometric models for evaluating validity, reliability, and bias (including generalisability theory)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the criterion view of validity?

A

Validity of a test: How well the test predicts the outcome it was designed to predict
o Validity as an absolute and static property of a test
o Validity = correlation with criterion (e.g. intelligence with school grades)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the problems with the criterion view of validity?

A

o Not always one obvious criterion available (no pure measures of the attribute)
o Some tests used for different purposes, in different groups (e.g. due to language, age)
o Validity is dependent on:
♣ Test purpose and use
♣ Characteristics of the test-takers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the three key components of the tripartate view of validity?

A
  • Criterion validity (correlations with a criterion)
  • Content validity
  • Construct validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is criterion validity?

A

♣ Concurrent: Criterion measured at same time as test administered
♣ Predictive: Criterion measured at some time after test administered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is content validity?

A
  • The theoretical framework about what the test should measure
  • Content of test is both relevant to domain and representative of domain
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is construct validity?

A

♣ Convergent: concepts that are theoretically related demonstrate empirical relationships
♣ Discriminant: concepts that are theoretically UNrelated; show no empirical relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the problems with the tripartate view?

A

o Too much emphasis on different forms of validity
♣ Test can have “convergent validity” but not “predictive validity” – what does this mean?
♣ Distinction between convergent and concurrent not always clear
• E.g. Correlation between a vocabulary test and English-language grade?
o Overemphasis on correlations as proof
o No explicit mention of test use and consequences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is ‘validity’ in the 1999/2014 Test Standards?

A

Validity is a property of the interpretation of test scores, not the test scores themselves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Where is the Test Standards evidence derived from that the interpretation of test scores is valid? (The sources of evidence for test validity)

A
  1. The content of the test
  2. The response processes captured by the test
  3. The internal structure of the test
  4. The relationship of the test to other variables
  5. The intended and unintended consequences of testing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the test content evidence for validity?

A

♣ Relevance

♣ Representativeness (all items must be representative relate to important & critical parts)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the response processes as evidence for validity?

A

♣ Evidence should show that the test does measure the specific process it’s meant to capture
♣ Think aloud protocols, eye-tracking, computer models, susceptibility to manipulations, coaching, etc. (in line with theory)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the internal structure evidence as for validity?

A

♣ Number of sub-components discovered empirically equals number of sub-components theoretically expected
• Number of elements found = number of elements expected
• E.g. 6 facets of Conscientiousness from NEO-PI-R personality model

17
Q

What is the relationship to other variables as evidence of validity?

A

♣ Convergent and discriminant evidence (as in Tripartate Model)
♣ Test-criterion relationships (as in Tripartate Model)
• Suitability and technical quality of the criteria (relation to test scores)
♣ Validity generalisation
• Replication in different situations
o Different POPULATIONS (e.g. different countries, states, sectors)
o Different CONDITIONS (e.g. proctored/unproctored, timed/untimed)
• Replication for different purposes
o Job performance vs. academic achievement
o Performance on different types of jobs

18
Q

What are the intended and unintended consequences of testing as evidence for validity?

A

♣ Consider the consequences of testing (which can be unforeseen by the test developer and test user)
♣ EXAMPLE: Test scores (e.g. NAPLAN) intended by developer to be used to identify progress used to target policy towards areas of need
• ACTUAL CONSEQUENCES: League tables, high-SES flight from low-NAPLAN schools
♣ EXAMPLE: Test scores measuring Attribute A result in different hiring rates for members of different groups. For the test to show evidence of validity:
• The difference must be solely due to an unequal distribution of Attribute A
o Differences not due to construct-irrelevant variance/test bias
o E.g. Spatial skill test for pilot selection (men have better spatial skills than women)
♣ Distribution of relevant attribution
• Attribute A must be important for the job