Week 3: Flashcards by Vignesh Vasudevan

Variance Model (CTT)

Observed variance = True Variance + Error Variance

How well did you know this?

Not at all

Perfectly

Reliability Defined (Variance)

Proportion of observed score variance attributable to true score variance

How well did you know this?

Not at all

Perfectly

Reliability Coefficient

Tells us what proportion of the observed variance is non-error
- A coefficient of .75 indicates that 75% of the variance in test scores for the group is due to true differences and 25% of the variance in test scores is due to error.

How well did you know this?

Not at all

Perfectly

Test-Retest Reliability

Correlation between the scores obtained by the same persons on an identical test administered on two separate occasions

Shows the extent to which scores on a test can be generalised over different occasions

How well did you know this?

Not at all

Perfectly

What is the Inter-Test Interval

Time between test administrations

How well did you know this?

Not at all

Perfectly

Error Test-Retest addresses and does not address

Only appropriate for stable characteristics

Addresses test-taker variables
- i.e., fatigue

Influenced by test administration errors (weather) and Scoring + Interpretation.

How well did you know this?

Not at all

Perfectly

Test-Retest Limitations

Content Sampling Error

Nuisance to Obtain data

Practice Effect

How well did you know this?

Not at all

Perfectly

Alternate Form Reliability

Use of 2 separate forms of the test - (Similar items, time limit, content specifications etc.)

Correlation between scores obtained on the two test forms represents the reliability coefficient

How well did you know this?

Not at all

Perfectly

Error Sources of Alternate Forms

Unsystematic error depends on inter-test intervals
- Administered Immediately in Succession
–> Addresses content sampling

-Few days to Weeks
–> Addresses content sampling and test-taker variables (fatigue)

Both subject to test administration, scoring + interpretation errors.

How well did you know this?

Not at all

Perfectly

Limitations of Alternate Forms

Most tests do not have an alternate form

How well did you know this?

Not at all

Perfectly

Inter-Scorer Reliability

Degree of agreement or consistency between 2 or more scorers/raters

Reliability is determined by the correlation between different raters scores of the same persons

How well did you know this?

Not at all

Perfectly

Error Sources & Limitations of Inter-Scorer Reliability

Addresses errors from scoring + interpretation.

No info on any other sources of error

How well did you know this?

Not at all

Perfectly

How is Internal Consistency calculated

Reliability is determined by examining the relo among the items on one test at a single point in time

Are the items in a measure internally consistent with each other.
- Degree to which items are related to each other

How well did you know this?

Not at all

Perfectly

What is Split Half method

Involves correlating one half of a test with the other half

How well did you know this?

Not at all

Perfectly

What is the Spearman Brown Correction

Allows the estimation of the reliability of the whole test from a correlation of the 2 half-tests (obtained from split-half method)

How well did you know this?

Not at all

Perfectly

What is Kuder-Richardson Method

KR-20 and KR-21
Used for dichotomous items (true/false, right/wrong)
KR-20 gives a coefficient for any test which equal to the average of all possible split-half coefficients

What is Cronbach’s Alpha

Coefficient Alpha (rα) is a more general model of KR‐20 that does not require dichotomously scored items (e.g., agree/not sure/disagree)

When items are scored dichotomously, rα = rKR20

The most popular coefficient for reporting internal consistency

Error Sources for Internal Consistency

Addresses unreliability due to content sampling

Subject to changes in test administration, test-taker variables, scoring + interpretation

Limitations of Internal Consistency

A test developed to have high internal consistency by having items with highly similar content →
Content sampling may be so constricted as to be trivial

Inappropriate for some speed tests, such as tests of clerical speed or reading rate.

Interpreting Reliability Coefficients

.90’s = high reliability
.80’s = moderate to high reliability
.70’s = low to moderate reliability
.60’s = unacceptably low reliability

How to improve reliability

Increase no of items
Discard low reliability items
Estimate correlation without measurement error

Standard Error of Measurement vs Reliability Coefficient

Reliability coefficient is used to make judgements about the overall value of a particular test

SEM is used to make judgements about individual scores obtained with the test

Does the individual’s observed score on a test provide a good indication of their true score?

CTT = OS = TS + E

SEM used to see how big or small the E is

What is Standard Error of Measurement

The SEM indicates the precision of our estimate of an individual’s true score.

Assuming Normal Distribution SEM = SD

Lower SEM = Higher Reliability

What is the % score threshold in relation to SEM & SD

68% of scores fall under 1 SEM/SD

95% of scores fall under 1.96 SEM/SD

99% of scores fall under 2.58 SEM/SD

Confidence Intervals in relation to SEM

if a test-taker scores 105 on our intelligence test and the test has a SEM of 3, then we can be 68% confident that: The true score falls within 105 ± 1 SEM The true score falls within ± 3 The true score falls between 102 and 108

What is Test Validity?

The extent to which the test measures what it claims to measure.

What is Content-Related Validity?

The relationship between the content of a test and some well-defined domain of knowledge or behaviour. Whether Test items provide a representative sample of all possible items in the relevant domain

What is the Content Validity Ratio and its use?

Statistical measure used to determine whether an item should be included to increase content validity Content Validity Ratio (CVR) = ne- (N/2)N/2 Where: ne = number of panellists indicating “essential” N = total number of panellists greater no of essential panellists = greater CVR. 1 = better 0 = bad

What areas is content validity used for?

Employment tests & Achievement tests Other areas content validity often a poor indicator of test validity

What is Criterion-Related validity and what are its types

Assesses how well a test or measure correlates with a specific, external criterion or outcome. The relationship to be established between scores on a test and some criterion external to the test. 1. Predictive Validity - Where test scores are used to predict status on some future criterion measure of relevance 2. Concurrent Validity - Where test scores are related to some criterion measure obtained at the same point in time.

What is the validity coefficient

Simple a correlation coefficient - typically pearson correlation coefficient.

What is the Standard Error of Estimate

It provides the margin of error to be expected in the predicted criterion score based on test scores. The SEEST = SEY' is expressed as: SEY' = SDy1-rxy2 Where SDy = the SD of criterion scores rxy = the validity coefficient

Standard Error of Measurement vs Standard Error of Estimate

SEM - Margin of measurement error caused by unreliability of the test. - Error in individual’s test scores. - SEM = SD1-rxx SE est - Margin of prediction error caused by imperfect validity of test. - Error of prediction in regression line. - SEY' = SDy1-rxy2

What is Construct Validity

Refers to any evidence which supports the proposition that the test measures its intended construct. --> Subsumes all types of validity (incl. content and criterion)

What are Methods for Construct Validity

- Analyse if items or subtest are homogeneous and measuring a single construct - Study if developmental changes in scores are consistent with theory - Correlate the test with other related (i.e., convergent validity) and unrelated constructs (i.e., discriminant validity) --> Convergent Scale correlates with other scales that measure the same or similar construct --> Divergent/Discriminant Scale does not correlate with other domains - Evaluate if group differences in scores are theoretically consistent - Factor analysis of test scores to examine internal structure of test - Analyse if test scores allow proper classification of examinees - Evaluate if intervention effects on scores are consistent with theory

Week 3: Flashcards

Reliability and Validity