Reliability & Validity Flashcards

Question 1

Q

Reliability

Answer

A

Are the results consistent?

- Provides an estimate of the proportion of unsystematic error <—need to know the degree of to determine reliability

Question 2

Q

Validity

Answer

A

Does it measure what it says it measures?
Overall eval of evidence and degree of trustworthiness
Determine if enough support exists to use the test in a certain way

Question 3

Q

Classical Test Theory

Answer

A

Observed score = T + E
T is the true score if the test is completely free from error
E is the error

Question 4

Q

Unsystematic Error

Answer

A

Random errors: mood, health, fatigue
Administration differences
Scoring differences
Random guessing

Question 5

Q

Systematic Error

Answer

A

Constant errors that occur every time tested, like a typo

Question 6

Q

Reliability Related to Validity

Answer

A

High validity can occur if high reliability exists
High validity cannot occur if low reliability
High reliability does not suggest high validity

Question 7

Q

Correlation Related to Reliability

Answer

A

Correlation: Statistical technique used to examine consistency
Reliability is often based on consistency between two sets of scores

Question 8

Q

Positive Correlation

Answer

A

As one increases, so does the other

Question 9

Q

Negative Correlation

Answer

A

As one increases, the other decreases

Question 10

Q

Correlation Coefficient (Pearson-Product Moment)

Answer

A

Correlation coefficient: numerical indicator of the relationship between two sets of data
PPM correlation coefficient - most common
-1 to +1: closer to absolute value 1=stronger relationship

Question 11

Q

Test-Retest

Answer

A

Give same test twice to same group
Correlation between first and second administration (2-6 weeks away)
Possible influences: shorter gap, high correlation, changes in administration, interventions, practice test
Ex: skills-based test

Question 12

Q

Alternate Forms

Answer

A

Very difficult
Correlation off scores from two equivalent forms of a test
Measures stability (over time) and equivalence (construct similarity)
Use sample of different times from same domain

Question 13

Q

Internal Consistency

Answer

A

One administration
One form of instrument
Divides instrument and correlates the scores from the different portions

Question 14

Q

Split-Half Reliability

Answer

A

Given once then split in half to determine reliability
Need to divide instrument into equivalent halves, like even and odd
Problem: dividing instrument in half makes number of items smaller —> smaller correlation

Doesn’t work if test increases in difficulty and doesn’t quick fix problem

Question 15

Q

Kinder-Richardson

Answer

A

KR-20: heterogeneous items
KR-21: homogenous items - single construct (cannot be used if items are from the same domain or differ in difficulty)
Lower reliability coefficient then split-half
Purpose: Estimate the average of all split-half reliabilities from all ways of splitting the instrument

Question 16

Q

Pearson-Product Coefficient Alpha

Answer

A

Used for non-dichotomous scoring
Ex: Likert scales
Cronbach’s alpha
Takes into account variance of each item
Conservation estimate of reliability
Most common

Question 17

Q

Standard Error of Measurement (SEM)

Answer

A

Provides estimate of range of scores if someone were to take instrument repeatedly
Based on idea that if someone takes test multiple times, scores would fall into a normal distribution

Question 18

Q

SEM v. SD

Answer

A

SD is spread of scores between students
SEM is spread of scores for one student
Uses same estimations

Question 19

Q

Content-Related Validity

Answer

A

Test items measure the objectives they are supposed to measure
Focus on how content was determined
May be based on test creator’s own analysis of topic or expert analysis
How well do test items reflect the domain of material being tested

Question 20

Q

Criterion-Related Validity

Answer

A

Test scores related to specific criterion/variable
Sources of criterion scores: academic achievement, level of education, performance in specialized training, job performance, psychiatric diagnosis, ratings by supervisors, correlations with previously available tests

Question 21

Q

Concurrent Validity (Criterion-Related)

Answer

A

Scores on test and criterion measure are collected at same point
Ex: achievement, certification
Scorer typically higher than predictive
Require reliable and bias-free measures

Question 22

Q

Predictive Validity (Criterion-Related)

Answer

A

Test is administered first and scores on criterion measure are collected at a later time
Ex: SAT, college GPA
Require reliable and bias-free measures

Question 23

Q

Construct Validity

Answer

A

What do scores on this test mean or signify
Construct: Grouping of variables that make up observed behavior patterns
Ex: Self-efficacy, personality
Measured by correlation of 2 scores or factor analysis
Often seen in psych tests

Question 24

Q

Convergent v. Discriminant (Construct Validity)

Answer

A

-Covergent: Positive correlation with other tests measuring the same/similar construct

Question 25

Q

Threats to Construct Validity

Answer

A

Too many variables
Under-represented: missing measuring parts of construct
Extra questions
Items are too similar

Question 26

Q

Overall Threats to Validity

Answer

A

History: outside events during course of test
Maturation: natural development with age
Testing: repeat testing; changes due to practice
Instrumentation: changes in measurement procedures
Statistical regression: regression to mean after extreme score first time
Interaction: any combo of 2
Mortality: drop out
Collection of subjects: bias of collecting subjects and assigning to groups

Question 27

Q

Face Validity

Answer

A

Not legitimate

- Based on appearance of the measure and its test items

Question 28

Q

Types of Evidences

Answer

A

Test content
Response processes
Internet structures
Relations to other variables
Consequences of testing

Question 29

Q

Item Analysis

Answer

A

Examine and eval each item in the test —> get rid of items that don’t work
Done during instrument development or revision

Question 30

Q

Item Difficulty

Answer

A

Index reflecting proportion of people getting item correct
0.0= no one got it correct
1.0= everyone got it correct
0.5= ideal for differentiation

Question 31

Q

Item Discrimination

Answer

A

Degree to which item correctly differentiates among test takers
Extreme group method: 2 groups - high scores, low scores (works with normal distribution)
Correlational method: performance of test v. item

Question 32

Q

Item Response Theory (IRT)

Answer

A

Focus on each item -considers mathematical relationship between abilities
2 major assumptions: unidimensionality, local independence
Most common in testing where there is a right/wrong answer v. preference
Models student ability using each question instead of aggregate score

Question 33

Q

Unidimensionality

Answer

A

Each item measures one ability or trait

Question 34

Q

Local Independence

Answer

A

Unrelated to responses on other items

Question 35

Q

Selecting Tasks

Answer

A

Determine what info is needed
Consider what info is needed
Search assessment resources
Eval possible instruments

Question 36

Q

Administering Tests

Answer

A

Pre-testing procedures
Administration
Scoring: by hand, computer, Internet

Question 37

Q

Communicating Results

Answer

A

Simple language
Individual v. Group
Written v. Oral
Communicate test’s strengths and limitations
Know the manual
Describe v. Just report cases
Use various results
Involve client
Encourage asking questions
Relate test to a goal

Question 38

Q

Problems with Reporting Result

Answer

A

Acceptance
Readiness of client
Negative results
Flat profiles and doesn’t show anything
Motivation and attitude

Question 39

Q

Communicating Test Results for Parents

Answer

A

Identifying information
Reason for referral
Background info
Test results and interpretation
Diagnostic impressions and summary