Week 2 - Reliability Flashcards

Question 1

Q

Most common misconception about reliability and validity

Answer

A

They are not properties of the test themselves but the properties of the test in a particular situation or for a particular purpose

Question 2

Q

Define reliability

Answer

A

The consistency with which a test measures what it purports to measure in any given set of circumstances. If something is reliable it can be depended on.

Does the test produce consistent responses?

Question 3

Q

What is social desirability bias?

Answer

A

A form of method variance common to the construction of psych tests of personality that arises when people respond to questions that place them in a favourable or unfavourable light.

Question 4

Q

The domain-sampling model

Answer

A

The test or assessment device draws from a larger set of items to give a score, therefore the score is an estimate.
If all possible questions had been asked, we would have the true position.
Thus, reliability becomes a question of sampling test items from a domain of all possible items.

Question 5

Q

Standard error of measurement

Answer

A

an expression of the precision of an individual test score as an estimate of the trait as it purports to measure

Question 6

Q

Reliability coefficient

Answer

A

an index - usually Pearsons’ r - of the ratio of true score to error score variance in a test given in a set of circumstances.
The proportion of observed score variance that is due to true score variance. 0.5 can be the minimum level of reliability.

Question 7

Q

What’s the oldest way to calculate reliability of a test?

Answer

A

Two forms of the same test, see if the different items agree in the scores they yield.

Minimises practice effects.
However, If they lead to different scores, then one (but we don’t know which one) cannot be depended on.

Question 8

Q

What is Split-half reliability?

Answer

A

Split the test in half to compare scores. e.g. odd numbers scores compared to even numbers. By correlating the numbers from the two tests (with a high enough sample of ppts) you get an estimate of reliability.
(When you have larger samples, just use the whole test)

Question 9

Q

Pros and cons of split half reliability?

Answer

A

With odd even number method, fatigue effects are the same for both halves of the test.
But for speeded tests (time limit), it is not recommended.

But odd even method is arbitrary and different methods of splitting have different pros and cons.

Question 10

Q

How to work out Cronbach’s alpha?

Answer

A

Split the test into subtests, of 1 item each. Correlate all subtests with all other subtests and the average correlation is reliability.
i.e. ‘internal consistency’ of a test.

Question 11

Q

Cons of Cronbach’s alpha? (4)

Answer

A

Tests with high internal consistency can just have items with similar content.
Although faithfully sampling a domain, the domain might be trivial.
High internal consistency does not mean the test is measuring the thing. The items might be interrelated but not homogenous/unidimensional.
If there are multiple factors (traits) underlying performance on a test, alpha can overestimate the reliability the factor thought to underlie the test.
So Confirmatory Factor Analysis might be better.

Question 12

Q

What is test-retest reliability?

Answer

A

the estimate of reliability obtained by correlating scores on the test obtained on two or more occasions of testing The stronger the correlation, the more reliable.

Important for retesting patients to see if they are getting worse etc. If the test will drift over time, they need a more reliable test.

Question 13

Q

What is Generaliseability theory?

Answer

A

Cronbach 1972 said in obtaining scores from a test, the user seeks to generalise beyond the particular score to some wider universe of behaviour. Users must SPECIFY the desired range of conditions over which this is to hold.

Question 14

Q

What is inter-rater reliability? What is the best method of obtaining it?

Answer

A

Correlation scores across different judges or raters. Consistency between raters.

Question 15

Q

How reliable does a test need to be?

Answer

A

VERY, if the test has serious consequences for the individual. But if it’s still being developed, a lower level of reliability will suffice.
Nunnally (1967) rule of thumb, 0.5 or better for a test developer, 0.7 or better for research, and better than 0.9 for individual assessment.

Question 16

Q

Which domains have most reliable tests?

Answer

Study These Flashcards

A

Cognitive abilities, followed by personality tests followed by projective techniques (criticized for being psychometrically evaluated at all)

Question 17

Q

What are parallel/alternate forms of reliability?

Answer

Study These Flashcards

A

the estimate of reliability of a test obtained by comparing two forms of a test constructed to measure the same construct, same pool of items. For eliminating practice effects on the same group doing the test more than once.

Question 18

Q

What can you do to increase reliability?

Answer

Study These Flashcards

A

Add more items, although this is not linear. Double items does not double reliability. For inter-rater, improve training of the raters about the characteristic being judged and the meaning of the points on the rating scale being used.

Question 19

Q

Internal consistency

Answer

Study These Flashcards

A

How well each item is correlated with each other item; whether items are measuring the same thing.

Correlation between performance on each item and overall performance. Cronbach’s a.

Question 20

Q

Confidence and reliability (3)

Answer

Study These Flashcards

A

The acceptable range of reliability depends on the variable being measured (see Field, 2013, p. 709)
» Unstable (dynamic) aspects are less reliable than stable (static) traits
» Lower levels of reliability = less confidence in the test data

Week 2 - Reliability Flashcards

(20 cards)