Lecture 3 Flashcards

1
Q

What are the four factors effecting reliability?

A
  1. People taking the test
  2. Item characteristics
  3. Test characteristics
  4. Method used to estimate reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name the two item characteristics that effect internal-consistency reliability

A
  • Correlation between items

- Number of items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What item characteristics must a test have to be reliable?

A

♣ A lot of items that show a small correlation; OR

♣ A few items that are strongly related

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is validity dependent on reliability?

A
  • CONCEPTUALLY: You must be measuring SOMETHING consistently to ensure that you are accurately measuring WHAT YOU WANT TO MEASURE
  • MATHEMATICALLY: The maximum correlation between two variables is determined by their reliabilities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can test items increase the test’s reliability?

A

Reliability increases as the number of items increase (assuming all additional items are of the same quality as existing items)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Even though adding more items increases a test’s reliability, why can’t test length be increased?

A

o Participant boredom
o Participant exhaustion
o Loss of motivation as test continues
o Rules used in practice: make the test as short as possible within acceptable reliability heuristics
♣ ~.90 to .95 for high-stakes individual applications
♣ ~.70 for more research purposes
♣ ~.60 if you have multiple indicators and can model statistical error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the practical problems with short tests?

A

o Item exposure (for high-stakes tests)
o For high-stakes applications (selection), you don’t want people to have seen the tests before
♣ Tests will lose their reliability if people have seen them before
o Ability to adequately sample the domain of interest
♣ Bandwidth (content coverage) vs. fidelity (reliability)
♣ Breadth of the construct measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Adaptive Testing?

A

o The test is adapted to the person’s level of ability (qualitatively analysed)
o A test-taker’s previous responses determine which items they see next
o Used in major test batteries
o Not computerised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Computerised Adaptive Testing (CAT)?

A

o A computerised algorithm automatically selects further items according to a decision rule
o Can use either blocks of items or use single items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the advantages of Computerised Adaptive Testing (CAT)?

A

o Tests are a lot shorter but just as reliable
♣ Good match between person-level and item-level
• Increase variability maximum reliability
♣ Less problems with fatigue, motivation, boredom
♣ Economic bottom line
• Less testing sites booked for fewer hours, less test proctors, etc.
o Easier to maintain test security
♣ Different set of items given to different people
o Motivation factors:
♣ Very able people are not getting large number of too easy items
♣ Low ability people are not failing item after item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the disadvantages of Computerised Adaptive Testing (CAT)?

A

o Substantial preparation and outlay needed
♣ Development of VERY LARGE ITEM POOL
♣ Analyses of VERY LARGE ITEM POOL to determine the difficulty of each item
♣ Automated programming of algorithm/decision rule
o Requires computerised administration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When is Computerised Adaptive Testing (CAT) most appropriate to use?

A

Most appropriate for large-scale testing where test security is an issue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference between “typical” performance and “maximum” performance?

A

o Typical performance = what you generally would do
♣ Usually measured by rating scales (self-reports)

o Maximum performance = the best you can do
♣ Usually measured by ability scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe what “maximum performance” consists of

A
♣	Aware of performance appraisal
♣	Willing to perform at his/her best
♣	Explicit standards
♣	Effort has to be exerted
♣	Role of motivation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a “vignette”?

A

Vignettes describe hypothetical people/scenarios and get ratings of them
o The average rating is taken off the person’s own self-rating

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are “culture-level variables”?

A

Culturally based variations (e.g. modesty bias, self-representation bias) that cause people to differ in the way they use response scales
o Cannot compare raw scores across cultures/different groups

17
Q

Why are anchoring vignettes used?

A

Used in large-scale international tests to correct for test error (from biases)