- inter-rater - test-retest - split half - parallel forms

Week 11 and 12: Reliability and Validity Flashcards by Nikki Rotondo

Define reliability

Consistency in measurement

How well did you know this?

Not at all

Perfectly

List 3 ways that consistency of scores occurs when re-examining the same people

the same test on different occasions
different set of items measuring the same thing
different conditions of testing

How well did you know this?

Not at all

Perfectly

What is standard error of measurement?

An estimate of the amount of error usually attached to an examinee’s obtained score

How well did you know this?

Not at all

Perfectly

What is a confidence interval

Confidence that you have that the population mean is within that interval

How well did you know this?

Not at all

Perfectly

What are some sources of random error?

test construction
test administration
test scoring and interpretation
test construction error

How well did you know this?

Not at all

Perfectly

List the ways of testing reliability

cronbach’s alpha
test retest
split half
item total correlations

How well did you know this?

Not at all

Perfectly

How big should a reliability coefficient be?

Above .8, preferably .9

How well did you know this?

Not at all

Perfectly

What does cronbach’s alpha measure

A set of all possible correlations between test items

How well did you know this?

Not at all

Perfectly

What is split half reliability

Taking half the items and seeing how they correlate with the other half

How well did you know this?

Not at all

Perfectly

What are item total correlations

Getting the item and comparing it to the rest of the scale

How well did you know this?

Not at all

Perfectly

What is test-retest reliability

correlation between two testing intervals
stability over time
uses Pearson’s r

How well did you know this?

Not at all

Perfectly

What are some problems with test-retest reliability

affected by factors associated with how the test is administered on each occasion
carryover effect: remember answer, practice effect
should only be used for meaningful data

How well did you know this?

Not at all

Perfectly

Internal consistency

The correlations between different items on the same test, or with the entire test

How well did you know this?

Not at all

Perfectly

Kuder-richardson reliability and coefficient alpha

based on the intercorrelations among all comparable parts f the test

How well did you know this?

Not at all

Perfectly

Kuder-richardson formula 20

calculated by the proportion of people who pass and fail each item and the variance of the test scores

How well did you know this?

Not at all

Perfectly

Inter-rater reliability

agreement through multiple raters

- measured using a kappa statistic

How well did you know this?

Not at all

Perfectly

Kappa statistic

Measures inter rater agreement for qualitative (categorical) items

How well did you know this?

Not at all

Perfectly

Parallel-forms reliability

Equivalent forms of the same test are administered to the same group

How well did you know this?

Not at all

Perfectly

Types of reliability

inter-rater
test-retest
split half
parallel forms

How well did you know this?

Not at all

Perfectly

Validity

The extent to which a test measures what it is supposed to measure

How well did you know this?

Not at all

Perfectly

What are the three types of validity

Study These Flashcards

content
criterion related
construct

Content validity

Study These Flashcards

Degree to which content (items) represents behaviour/characteristics associated that trait

What are the two types of criterion validity

Study These Flashcards

Predictive and concurrent

What is criterion validity

Study These Flashcards

The relationship between test scores and some type of criterion or outcome, such as ratings, classifications or other test scores

Concurrent validity

Refers to whether the test scores are related to some CURRENTLY AVAILABLE criterion measure

Predictive validity

The correlation between a test and criterion obtained at a FUTURE time e.g. ATAR scores predicting success at uni

Validity coefficient

Correlation between test scores and some criterion

What are the two types of construct validity?

Convergent and discriminant

Construct validity

The extent to which a test measures a psychological construct or trait

Convergent validity

Convergent validity takes two measures that are supposed to be measuring the same construct and shows that they are related.

Discriminant validity

Discriminant validity shows that two measures that are not supposed to be related are in fact, unrelated.

List the types of reliability

- test-retest - internal - interrater

In test-retest reliability, what are some sources that might affect a result?

- time - place - mood - temperature - noise

What are some core issues with content validity?

- the appropriateness of the questions and domain relevance - comprehensiveness - level of mastery assessed

What are some procedures to ensure content validity?

- specialist panels to map content domain - accurate test specifications - communication of validation procedures in test manual

What are some applications of content validity?

- achievement and occupational tests | - usually not appropriate for personality or aptitude tests

What is standard error

The population level of standard deviation

Do we want small or large SEM

Small, because larger lowers reliability and increases confidence intervals

Which confidence level is most common?

z = 1.96 (95%)

Why are confidence intervals better than p-values

- p value is a random arbitrary number - p values are biased towards high samples - p-values don't pick up on small effects that reoccur consistently

When is a confidence interval result significant

When the confidence interval doesn't overlap 0

Why are effect sizes beneficial?

They address significant affects that don't mean much in real life e.g. does someone .5 higher on depression really have a worse time

How can test scoring and interpretation be a source of random error (reliability)?

Because projective tests are all answered differently, there is a large role for inter rater disagreement e.g. TAT, rorschach

What is the domain sampling model?

Test items represent a sample of all possible items

What is the reliability ratio

Variance of observed score on test divided by variance of true score on long test

How many items should you have for optimal reliability

List some examples of concurrent validity

- depression scale and clinical interview - 2 measures at a similar time - IQ and exam scores

To be concurrently valid what kind of assessments should the measure be correlated with

The gold standard

What kind of test do you use for predictive validity

Multivariate ANOVA

What test do you use for convergent validity

Factor analysis

The lower reliability, the...

Higher the error in a test

The larger the Standard error of measurement, the

Less precise measurements and larger confidence intervals

Week 11 and 12: Reliability and Validity Flashcards

(52 cards)