Test Worthiness, Pt I Flashcards

1
Q

Test Worthiness

Four Cornerstones

A

Validity

Reliability

Cross-Cultural Fairness

Practicality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Test Worthiness

Correlation Coefficient

Correlation

A

Statistical expression of the Relationship between two sets of scores (or variables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Test Worthiness

Correlation Coefficient

Positive Correlation

A

Increase in one variable accompanied by an increase in the other variable
“Direct” relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Test Worthiness

Correlation Coefficient

Negative Correlation

A

Increase in one variable accompanied by a decrease in the other variable

“Inverse” relationship
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Test Worthiness

Correlation Coefficient

Correlation coefficient (r)

A

A number between -1 and +1 that indicates Direction and Strength of the relationship

As “r” approaches +1, strength increases in a direct and positive way

As “r” approaches -1, strength increases in and inverses and negative way

As “r” approaches 0, the relationship is weak or non existent (at zero)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Test Worthiness

Correlation, cont’d

The closer to 1 or -1 the stronger the correlation

A

Graph, Class 1, slide 5

1/0 is a PERFECT positive correlation
-1/0 is a PERFECT negative correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Reliability

A

Accuracy or Consistency of test scores

Would one score the same if they took the test over, and over, and over again?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Classical Test Theory

A

Assumes a Priori that any measurement of a human personality characteristic will be inaccurate to some degree

Charles Spearman (1904)

Observation + True Score + Error
X = T + E

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sources of Measurement Error

A

Item Selection

Test Administration

Test Scoring

Systematic and unsystematic measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Systematic and Random Error

Systemic Error

A

Impact All People Who Complete and Instrument (such as misspelled words or poorly conceived sampling of bx represented by the instrument’s questions).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Systematic and Random Error

Unsystematic Errors

A

Involve factors that affect Individual Expression of a Trait

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Item Response Theory

Item Response Funtion

A

Relationship between Latent Trait and Probability of Correct Response

Usual standard score range -3 to +3

Item difficulty parameter

Item discrimination parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Item Response Theory

Invariance in IRT

A

Individual trait level can be estimated from any set of items

IRFs do not depend on the population of examiners

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Rasch Scale

A

Based on Item Response Theory, the relationship between the Test Taker’s Probability of success on an item and the latent trait (e.g., the ability)

Test taker’s ability vs. item difficulty (both will vary)

The items are used to define the measure’s scale
Goal: Person’s ability - Item difficulty
Test-taker receives multiple items that matches their ability
Protects against the ceiling effect

See graph on Class 1, slide 18

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Rash Scale & Discriminatory Power

A

When an item measures a construct (has a good fit) the levels of the item will ovary with the train

When an item does not measure a construct, the levels of the item will not co-vary with the trait

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Four ways to determine Reliability

A
Internal Consistency
   A. Split-half or Odd Even
   B. Coefficient Alpha
   C. Kundera-Richardson
Test-Retest
Alternate, Parallel, or Equivalent Forms
Inter-rater reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Internal Consistency

A

Reliability within the test, rather than using multiple administrations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Internal Consistency

3 Types

A

Split-Half or Odd-Even

Corn Bach’s Coefficient Alpha

Kundera-Richardson

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Internal Consistency

Split-Half of Odd-Even Reliabilty

A

Correlate on half of test with other half for all who took the test

The correlation = the split half reliability estimate

The Spearman-Brown coefficient corrects for sampling bias

20
Q

Internal Consistency

Spearman-Brown Formula

A

See Class 1, Slide 23

21
Q

Internal Consistency

Cronbach’s Coefficient Alpha

A

Developed by Lee Cronbach in 1951

A formula for estimating the mean of all possible Split-Half Coefficients using items that have Three or more response possibilities or anchor definitions

**Report reliability coefficient for total and/or each scale or subtext

22
Q

Basics of Cronbach’s Coefficient Alpha

A

Cronbach’s alpha reliability coefficient normally ranges between 0 and 1

Closer alpha coefficient = 1.0, > internal consistency of the scale items

Standardized Item Alpha: Alpha coefficient when all scale items have been standardized (made into z scores).

    This coefficient is used only when the individual scale items are not scaled the same
23
Q

Internal Consistency

Kuder-Richardson

A

(KR-20) (KR-21)

Variation on alpha formula used with dichotomous data

An estimate the mean of all possible split-half coefficients

24
Q

Test-Retest Reliability

A

Give the same test Two or More Times to the Same Group of People then correlate the scores.

25
Q

Alternate, Parallel or Equivalent Forms of Reliability

A

Have Two or More forms or versions of the same test

Administer the two groups of respondents the two forms the same item (e.g., group1 gets version A and group 2 gets version B)

Correlate scores on first form with scores on second form

26
Q

Inter-Rater Reliability

A

The degree of agreement between two or more separate raters

Qualitative applications
Consensus coding

27
Q

Standardized Scores

A

A collective number of variations on standard scores devised by test specialists

They eliminate fractions and negative signs by producing values other than zero for the mean and 1.00 for the SD of the transformed scores

Important Point: we can transform any distribution to a preferred scale with predetermined mean and SD

28
Q

T-Score

A

Has a mean of 50 and a SD of 10

Common with personality tests

See pg 53, para 6 for formula

29
Q

Age Norm

A

Depicts the level of test performance for each separate age group in the normative sample

Purpose is to facilitate same-aged comparisons

30
Q

Grade Norms

A

Depicts the level of test performance for each separate grade in the normative sample

Rarely used with ability tests

31
Q

Local norms

A

Derived from representative local examines, as oppose to a national sample

32
Q

Subgroup Norms

A

Consist of the scores obtained from an identified subgroup as opposed to a diversified national sample

33
Q

Expectancy Table

A

Portrays the established relationship between test scores and expected outcome on a relevant task.

Useful with predictor tests used to forecast well-defined criteria

Always based on the previous predictor and criterion results for large samples of examinees…so, if conditions or policies change, and expectancy table can become obsolete or misleading

34
Q

Criterion-Referenced Tests

A

Are used to compare examinees’ accomplishments to a predefined performance standard

The focus is on what the test taker can do rather than on comparisons to the performance levels of others

Identify an examinees‘ s relative mastery (or nonmastery) of specific, predetermined competencies

Content of test is selected on the basis of its relevance in the curriculum

Best suited to the testing of basic academic skills in educational settings

35
Q

Norm-Referenced Tests

A

Purpose is to classify examinees, from low to high, across a continuum of ability or achievement

Uses a representative sample of individuals (norm group or standardization sample) as its interpretive framework

Items are chosen so that they provide maximal discrimination Amon. Respondents along the dimension being measured

36
Q

Characteristics of Criterion and Norm-Referenced Tests

A

Pg 57

37
Q

Reliabililty

A

Refers to the attribute of consistency in measurement

Best viewed as a continuum ranging from minimal consistency of measurement to near-perfect repeatability of results

38
Q

Classical Theory of Measurement

A

The idea that test scores result from the influence of two factors:

~Factors that contribute to consistency. These consist entirely of the stable attributes of the individual, which the examiner is trying to measure. (This is a desirable factor because it represents the true amount of the attribute in question, while the second factor represents the unavoidable nuisance of error factors that contribute to inaccuracies in measurement)

~Factors that contribute to inconsistency. These include characteristics of the individual, test, or situation that have nothing to do with the attribute being measured, but that nonetheless affect test scores.

The true score is never known! We can obtain a probability the the true score resides within a certain interval and we can also derive a best estimate of the true score.

39
Q

Sources of Measurement Error

A

Item Selection
Test Administration
Test Scoring
Systematic Errors of Measurement

40
Q

Unsystematic Measurement Error

A

Their effects are unpredictable and inconsistent

41
Q

Systematic Measurement Error

A

Arises when, unknown to the test developer, a test consistently measures something other than the trait for which it was intended

This is a problem for test validity

Results in inaccuracies of measurement

42
Q

Measurement Error and Reliability

A

ME reduces reliability or repeatability of psychological test results

43
Q

Main Features of Classical Theory

A

~Measurement errors are random

~Mean Error of measurement = 0

~True scores and errors are us correlated: rTe=0

~Errors on different tests are uncorrelated: r12=0

44
Q

Implications for Reliability and Measurement

A

TBD

45
Q

The Reliability Coefficient (eXX)

A

The ratio of true score variance to the total variance of test scores (pg 61)