Week 5-Reliability and Validity Flashcards

Question 1

Q

What are the 2 general dimensions considered when evaluating new measures?

Answer

A

Reliability
Validity

Question 2

Q

Define reliability

Answer

A

Reliability refers to the consistency of a measure – it is essentially about whether a measure is consistent. (scores should remain the same)

Reliability measures commonly take the form of correlation coefficients but there are different methods available.

Question 3

Q

What 3 types of reliability do psychologists consider?

Answer

A

Over time (test-retest reliability)
Across items (internal consistency)
Across different researchers (inter-rater reliability)

Question 4

Q

What is Test-Retest Reliability?

Answer

A

-When researchers measure a construct that they assume to be consistent across time, then the scores they obtain should also be consistent across time. For example, an intelligent person would score highly on an IQ test today and they should also score similarly highly on the IQ test if we test them a month later.

-Test-retest reliability is the extent to which this is actually the case (does the tool give the same measurement both times it is administered on an individual?).

-Assessing test-retest reliability requires using the measure on a group of people at one time and using it again on the same group of people at a later time, and then looking at the test-retest correlation between the two sets of scores.

-This is typically done by computing Pearson’s r.

-In general, a test-retest correlation of +.80 or greater is considered to indicate good reliability.

Question 5

Q

What is the problem with Test-Retest Reliability?

Answer

A

The problem with this for many tools is that the second time the measure is administered it is not effectively under the same conditions. (do other things affect this?)

Question 6

Q

What is Intraclass correlations (ICC)?

Answer

A

Intraclass correlations (ICC) look at the absolute agreement between variables.

ICC:
-<0.5 are indicative of poor reliability,
-0.5 to 0.75 indicate moderate reliability
-0.75 to 0.9 indicate good reliability
->0.90 indicate excellent

Question 7

Q

When do high test-retest correlations make sense?

Answer

A

-High test-retest correlations make sense when the construct being measured is assumed to be consistent over time, which is the case for constructs such as intelligence and self-esteem.

-However, some constructs are not assumed to be stable over time. For example, as mood changes over time, a measure of mood that produced a low test-retest correlation over a period of a month would not be an issue.

Question 8

Q

What is Internal Consistency?

Answer

A

-Internal consistency is the consistency of people’s responses across the items on a multiple-item measure.

-In general, all the items on such measures are supposed to reflect the same underlying construct.

-Thus, people’s scores on a set of items should be correlated with each other.

-For example, on the Rosenberg Self-Esteem Scale, people who agree that they are satisfied with themselves should also agree that they have a positive attitude toward themselves (i.e., high scores should be seen across).

Question 9

Q

Internal Consistency: What is the Split-Half Method?

Answer

A

-This method involves splitting the items on a questionnaire into two halves with each half measuring the same elements but in slightly different ways.

-For example, the items could be split into two sets such as the first and second halves of the items or the even- and odd-numbered items.

-Then a score is computed for each set of items and the relationship between the two sets of scores is examined.

-If a scale is very reliable, a person’s score on one half of the scale should be the same (or similar) to their score on the other half. Thus, across several participants, scores from the two halves of the questionnaire should correlate perfectly (or very highly)!

Question 10

Q

Internal Consistency: What is the correlation between the two halves in the Split-Half Method?

Answer

A

-The correlation between the two halves is the statistic computed in the split-half method, with larger correlations being a sign of reliability.

-A split-half correlation of +.80 or greater is generally considered good internal consistency.

-The problem with this method is that there are several ways in which a set of data can be split into two and so the results could be a product of the way in which the data were split (i.e., different results)

Question 11

Q

Internal consistency: What is Cronbach’s α?

Answer

A

■ The most common measure of internal consistency is a statistic called Cronbach’s α.

■ Cronbach’s alpha refers to how closely related a set of items are as a group.

■ Extent to which different items on the same test (or the same subscale on a larger test) correlate with each other.

■ Alpha coefficient ranges from 0 to 1: the higher the score, the more reliable the scale is.

■ A value of +.70 or greater is generally taken to indicate good internal consistency (Kline, 1999)

Question 12

Q

What are the threshold categories for reliability?

Answer

A

1 : perfect reliability,
≥ 0.9: excellent reliability,
≥ 0.8 < 0.9: good reliability,
≥ 0.7 < 0.8: acceptable reliability,
≥ 0.6 < 0.7: questionable reliability,
≥ 0.5 < 0.6: poor reliability,
< 0.5: unacceptable reliability,
0: no reliability

Question 13

Q

What are the problems with Cronbach’s alpha?

Answer

A

-It’s a lower bound estimate- this means it gives the lowest estimate of reliability (i.e. it’s pessimistic)

-Assumption of tau-equivalence - the same true score for all test items (all items have
the same factor or component loadings) this is unlikely and can reduce alpha
estimates by up to 11% if the assumption is not met

-More questions = higher alpha (i.e., false positive)

Question 14

Q

What is inter-rater reliability?

Answer

A

■ Inter-rater reliability is the extent to which different observers are consistent in
their judgments e.g., Bandura’s Bobo Doll Study

■ Inter-rater reliability is assessed using Cronbach’s α and ICC’s when the
judgments are quantitative or Cohen’s κ when the judgments are categorical e.g., behaviour categorised as good or bad.

Question 15

Q

What is Validity?

Answer

A

■ Validity is the extent to which the scores from a measure represent the variable they are intended to.

■ Essentially, validity is concerned with ascertaining if something does what it is supposed to do.

■ It therefore represents the truthfulness of a measure.

■ There are three basic kinds:
1. Face validity
2. Content validity
3. Criterion validity

Question 16

Q

What is Face Validity?

Answer

A

■ Face validity is the extent to which a measurement method appears “on its face” to measure the construct of interest (i.e., at face value).

■ It is a simple test of validity although it is largely subjective.

■ Face validity is usually assessed informally. It could be assessed as part of the pilot stage.
– For instance, a researcher could ask pilot participants: “does this measure appear
to measure x?”

■ Most people would expect a self-esteem questionnaire to include items about whether they see themselves as a person of worth and whether they think they have good qualities. Thus, a questionnaire that included these kinds of items would have good face validity.

Question 17

Q

What is Content Validity?

Answer

A

■ Content validity is the extent to which a measure “covers” the construct of interest (i.e., does it contain all the items of a concept? e.g., covering different aspects of addiction).

■ So we need to ask the question: does the measure include all necessary items to measure the concept in question?

■ Content validity is assessed by carefully checking the measurement method
against the conceptual definition of the construct.

Question 18

Q

What is Criterion Validity?

Answer

A

■ Criterion validity is the extent to which people’s scores on a measure are correlated with other variables (known as the criteria) that one would expect them to be correlated with.

■ A criterion can be any variable that one has reason to think should be correlated with the construct being measured.

– You would expect test anxiety scores to be positively correlated with general anxiety and with blood pressure during an examination.

– You would expect test anxiety scores to be negatively correlated with exam performance and course grades.

■ When the criterion is measured at the same time as the construct, criterion
validity is referred to as concurrent validity; however, when the criterion is measured at some point in the future (after the construct has been measured), it is referred to as predictive validity (because scores on the measure have “predicted” a future outcome).

■ Concurrent validity = how well does the test correlate with other established
tests, at around the same time.

■ Predictive validity = how well does the test predict something in the future such
as job performance or degree grade.

■ Criteria can also include other measures of the same construct. This is known
as convergent validity.

■ For example, one would expect new measures of test anxiety to be positively
correlated with existing measures of the same construct.

■ This is simply a correlational test.

Question 19

Q

What is Discriminant Validity?

Answer

A

■ Discriminant validity is the extent to which scores on a measure are not correlated with measures of variables that are conceptually distinct (i.e., the degree to which items designed to measure different constructs discriminate between each other).

■ A new scale should not correlate with other scales designed to measure a
different construct.

■ This is simply a correlational test.