Week 3: Flashcards

Reliability and Validity

1
Q

Variance Model (CTT)

A

Observed variance = True Variance + Error Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Reliability Defined (Variance)

A

Proportion of observed score variance attributable to true score variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Reliability Coefficient

A

Tells us what proportion of the observed variance is non-error
- A coefficient of .75 indicates that 75% of the variance in test scores for the group is due to true differences and 25% of the variance in test scores is due to error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Test-Retest Reliability

A

Correlation between the scores obtained by the same persons on an identical test administered on two separate occasions

Shows the extent to which scores on a test can be generalised over different occasions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Inter-Test Interval

A

Time between test administrations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Error Test-Retest addresses and does not address

A

Only appropriate for stable characteristics

Addresses test-taker variables
- i.e., fatigue

Influenced by test administration errors (weather) and Scoring + Interpretation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Test-Retest Limitations

A

Content Sampling Error

Nuisance to Obtain data

Practice Effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Alternate Form Reliability

A

Use of 2 separate forms of the test - (Similar items, time limit, content specifications etc.)

Correlation between scores obtained on the two test forms represents the reliability coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Error Sources of Alternate Forms

A

Unsystematic error depends on inter-test intervals
- Administered Immediately in Succession
–> Addresses content sampling

-Few days to Weeks
–> Addresses content sampling and test-taker variables (fatigue)

Both subject to test administration, scoring + interpretation errors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Limitations of Alternate Forms

A

Most tests do not have an alternate form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Inter-Scorer Reliability

A

Degree of agreement or consistency between 2 or more scorers/raters

Reliability is determined by the correlation between different raters scores of the same persons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Error Sources & Limitations of Inter-Scorer Reliability

A

Addresses errors from scoring + interpretation.

No info on any other sources of error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is Internal Consistency calculated

A

Reliability is determined by examining the relo among the items on one test at a single point in time

Are the items in a measure internally consistent with each other.
- Degree to which items are related to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Split Half method

A

Involves correlating one half of a test with the other half

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the Spearman Brown Correction

A

Allows the estimation of the reliability of the whole test from a correlation of the 2 half-tests (obtained from split-half method)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Kuder-Richardson Method

A

KR-20 and KR-21
Used for dichotomous items (true/false, right/wrong)
KR-20 gives a coefficient for any test which equal to the average of all possible split-half coefficients

17
Q

What is Cronbach’s Alpha

A

Coefficient Alpha (rα) is a more general model of KR‐20 that does not require dichotomously scored items (e.g., agree/not sure/disagree)

When items are scored dichotomously, rα = rKR20

The most popular coefficient for reporting internal consistency

18
Q

Error Sources for Internal Consistency

A

Addresses unreliability due to content sampling

Subject to changes in test administration, test-taker variables, scoring + interpretation

19
Q

Limitations of Internal Consistency

A

A test developed to have high internal consistency by having items with highly similar content →
Content sampling may be so constricted as to be trivial

  • Inappropriate for some speed tests, such as tests of clerical speed or reading rate.
20
Q

Interpreting Reliability Coefficients

A

.90’s = high reliability
.80’s = moderate to high reliability
.70’s = low to moderate reliability
.60’s = unacceptably low reliability

21
Q

How to improve reliability

A
  1. Increase no of items
  2. Discard low reliability items
  3. Estimate correlation without measurement error
22
Q

Standard Error of Measurement vs Reliability Coefficient

A

Reliability coefficient is used to make judgements about the overall value of a particular test

SEM is used to make judgements about individual scores obtained with the test

Does the individual’s observed score on a test provide a good indication of their true score?

CTT = OS = TS + E

SEM used to see how big or small the E is

23
Q

What is Standard Error of Measurement

A

The SEM indicates the precision of our estimate of an individual’s true score.

Assuming Normal Distribution SEM = SD

Lower SEM = Higher Reliability

24
Q

What is the % score threshold in relation to SEM & SD

A

68% of scores fall under 1 SEM/SD

95% of scores fall under 1.96 SEM/SD

99% of scores fall under 2.58 SEM/SD

25
Confidence Intervals in relation to SEM
if a test-taker scores 105 on our intelligence test and the test has a SEM of 3, then we can be 68% confident that: The true score falls within 105 ± 1 SEM The true score falls within ± 3 The true score falls between 102 and 108
26
What is Test Validity?
The extent to which the test measures what it claims to measure.
27
What is Content-Related Validity?
The relationship between the content of a test and some well-defined domain of knowledge or behaviour. Whether Test items provide a representative sample of all possible items in the relevant domain
28
What is the Content Validity Ratio and its use?
Statistical measure used to determine whether an item should be included to increase content validity Content Validity Ratio (CVR) = ne- (N/2)N/2 Where: ne = number of panellists indicating “essential” N = total number of panellists greater no of essential panellists = greater CVR. 1 = better 0 = bad
29
What areas is content validity used for?
Employment tests & Achievement tests Other areas content validity often a poor indicator of test validity
30
What is Criterion-Related validity and what are its types
Assesses how well a test or measure correlates with a specific, external criterion or outcome. The relationship to be established between scores on a test and some criterion external to the test. 1. Predictive Validity - Where test scores are used to predict status on some future criterion measure of relevance 2. Concurrent Validity - Where test scores are related to some criterion measure obtained at the same point in time.
31
What is the validity coefficient
Simple a correlation coefficient - typically pearson correlation coefficient.
32
What is the Standard Error of Estimate
It provides the margin of error to be expected in the predicted criterion score based on test scores. The SEEST = SEY' is expressed as: SEY' = SDy1-rxy2 Where SDy = the SD of criterion scores rxy = the validity coefficient
33
Standard Error of Measurement vs Standard Error of Estimate
SEM - Margin of measurement error caused by unreliability of the test. - Error in individual’s test scores. - SEM = SD1-rxx SE est - Margin of prediction error caused by imperfect validity of test. - Error of prediction in regression line. - SEY' = SDy1-rxy2
34
What is Construct Validity
Refers to any evidence which supports the proposition that the test measures its intended construct. --> Subsumes all types of validity (incl. content and criterion)
35
What are Methods for Construct Validity
- Analyse if items or subtest are homogeneous and measuring a single construct - Study if developmental changes in scores are consistent with theory - Correlate the test with other related (i.e., convergent validity) and unrelated constructs (i.e., discriminant validity) --> Convergent Scale correlates with other scales that measure the same or similar construct --> Divergent/Discriminant Scale does not correlate with other domains - Evaluate if group differences in scores are theoretically consistent - Factor analysis of test scores to examine internal structure of test - Analyse if test scores allow proper classification of examinees - Evaluate if intervention effects on scores are consistent with theory