Reliability and coefficient alpha Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is reliability?

A

Reliability is the desired consistency or reproducibility of test scores?
- A measure of the extent of error present in a test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Do we assume there is always some error in measurement?

A

Yes and that error is random.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do we assume leads to the differences in a person’s score on a test?

A

Measurement error. It is unlikely that a person’s true score will change every time they take a test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do we expect the distribution of scores to be for a test?

A

Normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What four assumptions underlie classical test theory?

A
  1. Each person has a true score we could obtain if there was no measurement error.
  2. There is measurement error, but this error is random.
  3. The true score of an individual doesn’t change with repeated applications of the same test even though their observed score does.
  4. The distribution of random errors (and thus observed test scores) will be the same for all ages.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the domain sampling model?

A
  • It is another central concept of classical test theory.
  • If we construct a test on something, we can’t ask all possible questions, so we only use a few test items (a sample)
  • Using fewer test items can lead to the introduction of error. We need to determine whether the test items adequately sample the domain or construct.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the point of reliability analysis?

A

Reliability analysis is conducted to ascertain how much error we would make by using a score from a shorter test as an estimate of someone’s true ability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are three things to note regarding reliability analysis?

A
  • Reliability= variance of observed score on short test/variance of true score.
  • Observed test scores should be correlated with true score.
    -As the sample gets larger, estimate is more accurate
  • It is easy to work out reliability if we have the true score.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What can affect reliability measurements?

A
  • Different ways of measuring reliability are sensitive to measurement error.
  • We consider various sources of measurement error.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is “Standard Error of Measurement”?

A
  • We can workout how much measurement error we have by working out how much, on average, an observed score on our test differs from the true score.
  • We know that a person’s observed score differs from their true score, and that their true score is unknowable. But we can calculate the range in which a person’s true score should fall by calculating the Standard Error of Measurement.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the formula for standard error of measurement?

A

SEM= SD(sqrt)(1-r)
- SD of the scores
- r is the reliability of the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do we do once we know the SEM?

A
  • We can use it to create confidence intervals.
  • The z-score for a 95% confidence interval= 1.96
  • Lower bound= x-(1.96*SEM)
  • Upper bound= x+(1.96*SEM)
    where x is the person’s score on the test.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the different types of reliability?

A
  1. Test-retest reliability
  2. Parallel forms reliability
  3. Internal consistency (split-half reliability, Kuder-Richardson 20 reliability, coefficient/Cronbach’s alpha)
  4. Inter-rater reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is test-retest reliability?

A
  • The simplest way to establish reliability is to administer the test or
    scale to a sample on two different occasions. If the scale is reliable,
    the scores at the test and retest administration should be strongly
    correlated.
  • The correlation between the 2 scores is also known as the coefficient of stability.
  • The source of error measured is time sampling.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the issues with test-retest reliability?

A
  • What is the optimal length of time that should elapse between the administrations? If it is too soon, the participants may recall their answers from the first administration. If left too long, extraneous
    events may influence the scores on the scale.
  • There are issues around using it when measuring things that are more transient like mood.
  • ## What if some event happens in between first and second administration?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is parallel form reliability?

A

Alternate-forms reliability requires the construction of two equivalent versions of the same test, which have items that are closely matched. Then the two forms are administered to the same set of people either at different times or at the same time.
- The correlation between the two forms is known as the coefficient of equivalence
- The source of error measure is item sampling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you change the form of a test for parallel forms reliability?

A
  • Questions or response alternatives are reworded.
  • Order is changed (this reduces practice effects)
  • Can change the wording of the question.
18
Q

What are the issues with parallel forms reliability?

A
  • A problem with the alternative forms method is that it is both difficult and expensive to produce alternate forms that are sufficiently independent and similar.
  • Difficult to generate a big enough item pool.
19
Q

What is inter-rater reliability?

A
  • Measures how consistently 2 or more rater/judges agree on rating something.
  • Multiple raters for measurement can improve measurement reliability.
  • Could do this by correlating raters’ scores.
  • This method does not factor in the number of times raters are correct by chance.
20
Q

What are the 2 different calculations used for inter-rater reliability?

A
  1. Cohen’s kappa is used when there are 2 raters or judges.
  2. Fleiss’ kappa is used when there are more than 2 raters or judges.
    - Ranges from 1 (perfect agreement) to -1.
    - >0.75 excellent agreement
    - 0.50-0.75 satisfactory agreement
    - 0.40 poor agreement
21
Q

What is internal consistency reliability?

A
  • It asks “do the different items within one test measure all the same thing to the same extent?”
  • Are items within a single test highly correlated?
  • The source of error measured is the internal consistency/reliability of one test administered on one occasion.
22
Q

What are the three different ways of measuring internal consistency?

A
  1. Split-half reliability
  2. Coefficient alpha
  3. KR-20 which is a special case of co-efficient alpha for the dichotomous format.
23
Q

What is split half-reliability?

A
  • A test is split in half, each half is scored separately, total scores for each half are then correlated to determine whether they yield similar measures.
  • A major advantage is that we only need one test.
  • A major challenge is diving the test into equivalent halves.
  • Calculated using Spearman-Brown.
24
Q

What are two of the major issues with split-half reliability?

A
  1. The fewer items we have, the lower our reliability (reference to the domain sampling model)
    - Therefore, each half of split test will have reduced reliability compared to the total test.
  2. Dividing the test into equivalent halves: the correlation will change for each different split. Ideally the halves should be equivalent.
25
Q

What is the solution to the first issue with split-half reliability?

A
  • The Spearman-Brown formula.
  • It is a more accurate reflection of internal consistency as it adjusts for the number of items.
26
Q

What is Cronbach’s alpha?

A
  • Cronbach’s alpha estimates the consistency of responses to different scale items.
  • It measures the error associated with each test item as well as the error associated with how well the test items fit together (internal consistency)
27
Q

How does the number of items in a test influence Cronbach’s alpha?

A
  • There is a positive, non-linear correlation between the number of items and reliability.
  • There is a rapid increase in internal consistency reliability from 2 to 10 items.
  • Steady increase from 11 to 30.
  • Tapers off after about 40 items.
28
Q

What are the different levels at which we interpret Cronbach’s alpha?

A
  • 0.00= no consistency in measurement
  • 1.00= perfect consistency in measurement
  • 0.70= this is the acceptable value for exploratory research.
  • 0.80= this is the acceptable level for basic research.
    -090= acceptable level for applied scenarios.
29
Q

What factors affect cronbach’s alpha?

A
  1. Multidimensionality- Cronbach’s alpha is better suited to unidimensional data.
  2. Bad test items- the greater the number of bad items, the more negatively alpha is impacted.
  3. The number of test items- increases reliability to a certain point.
30
Q

Recap of reliability

A

look at slides.

31
Q

What is coefficient alpha commonly described as in the literature?

A
  1. The mean of all split-half reliabilities.
  2. The lower bound of the reliability of a test
  3. A measure of first-factor saturation
  4. Equal to reliability in conditions of essential tau-equivalence
  5. A more general version of the KR coefficient of equivalence.
    - These are all wrong ? Well I know 1 and 3 are.
32
Q

What do we know about covariance and split halves reliability?

A
  • When we work out split-half reliability, we are taking into account the correlation between two groups of items.
  • If we are splitting a test in every possible way, knowing all the covariances between items is useful. This will tell us how much an item in one group will correlate with an item in another group.
33
Q

What is important to consider when splitting a test for split-halves reliability?

A
  • We need to know how the items fit together.
34
Q

What is first factor saturation?

A
  • It is the extent to which one factor is present in a test. i.e the extent to which a test is made up of one factor (as opposed to 2 or 3)
  • A factor is a dimension of a test.
35
Q

What else do I need to know about factors?

A
  • Factors are test dimensions, or aspects that are measured by a test.
  • Some test measure one thing, e.g. BDI. Some measure more than one thing e.g. the big 5 personality test.
36
Q

Why do people have this misconception about alpha?

A
  • There is confusion between the terms internal consistency and homogeneity.
36
Q

What is a misconception people have about the relationship between cronbach’s alpha and unidimensionality?

A
  • There is a misconception that alpha measures the extent to which 1 factor is present in a test. i.e people think that the higher the level of alpha, the more likely it is that a test is made up of one factor.
37
Q

What is the difference between internal consistency and homogeneity?

A

Internal consistency
- This is how inter-related items are
- The more the items are inter-related, the more internally consistent the test is.
Homogeneity
- Refers to unidimensionality (a test having only one factor)
- Something that is homogenous is made up of only one thing. E.g. if everyone in the class was female, the class would have homogeneity of gender

38
Q

Why do people confuse internal consistency with homogeneity/unidimensionality?

A
  • Alpha measures how well items fit together in terms of how much co-variance (shared variance) the items have)
  • According to common sense, the more co-variance items have, the more they should fit together to make up one thing. i.e. they should measure one factor only. A high level of alpha should therefore mean that your test is measuring one factor.
39
Q

What is the problem with the common sense view?

A
  • Internal consistency and unidimensionality are not actually the same thing.
  • Research shows that tests can have many different factors, but still have high levels of alpha. E.g. The WAIS-III comprises of two factors (VIQ and PIQ) but still has a high level of alpha. another example is the resiliency scale of Wagnield and Young (1993). It also has two factors (Self-motivation and optimism). The overall alpha for this is 0.90.
40
Q

Why do these multidimensional tests have high levels of alpha?

A
  • This is because they measure things that are related to each other- as well as to some overall thing.
  • The WAIS measures VIQ and PIQ – both types of Intelligence, which the overall scale measures.
  • The Resiliency Scale measures Self-motivation and Optimism – both related to each other, and both influence Resilience
  • So they all have high levels of alpha because they are all largely internally consistent. Meaning that they measure things that are related to each other, so the items share a lot of covariance.
41
Q

Summary of Alpha and ! factor confusion

A
  • Alpha is sometimes said to be a measure of the extent to which a test measure 1 factor.
  • This is because alpha is a measure of internal consistency, and the more internally consistent the test, the more covariance the items share
  • The more covariance the items share, the more they are supposedly part of the same factor. But we know that this is not always the case.
  • However, tests that measure only one factor often have high levels of alpha, as they are very internally consistent.
  • If you are going to construct a test with more than one factor, you need to make sure that both factors are related to each other as well as to the overall thing you are trying to measure.