Ch3 - Test Score Interpretation Flashcards

Question

Equating Procedures

Answer 1

Comparing scores of individuals/groups across time or in various psychological functions against an uniform norm Ex: comparing test college admission scores over time Allows to save money and time on standardization procedures Goal: make scores from different tests more comparable

Answer 2

``` Creating alternate forms that are alike in the content they cover but vary in their specific items Useful for when someone has to take the same test on separate occasions Practice effects (score increases attributable to practice) do come in effect, but less ```

Answer 3

equated in content coverage, procedures AND some statistical characteristics (raw score means and SD, indexes of variability/reliability)

Answer 4

when one part of a test (a set of items) is the same in 2 different tests, so both tests are comparable even though their normative sample might not be the same at all. The purpose of the anchor test is to provide a baseline for an equating analysis between different forms of a test

Answer 5

Anchor tests embedded in each successive form of a test to provide a linkage to one or more earlier forms of the same test ○ SATs: best example of fixed reference groups use § Until 1995, the reference group was the test takes of 1941: mean of 500, SD of 100 § Then they changed the reference

Answer 6

norming 2+ tests on the same sample, makes for easier comparison of the performances

Answer 7

a. Score for each respondent is compared with an absolute standard, which is: i. External to the test ii. Established by content experts of that particular area iii. Some type of threshold / minimum score that the examinee has to score 1) A pass-fail system (must be above xyz) Those tests are more typical for establishing mastery in someone who already has some level of skill Ex: licensing exams to do a certain profession What is mastery? The minimum level of skills in order to say that a person has some basic skills on the idea What is that threshold score for pass-fail? Ex: driver's license exam 2 parts - theoretical (threshold might be 20/25 questions, for ex) and skill (threshold might be something like 75% of maneuvers, for ex) What constitutes basic mastery, what should the cutoff be? Usually established by experts (ex: in public safety, transportation, etc)

Answer 8

• Grade equivalents are: ○ Simple to understand ○ Parent friendly • What's wrong? - 2 reasons 1. It relies on interpolation to assign most GE scores, not actual data 2. The ST DEV of the standard scores are ignored, and the GE are based on means only A problem because, the GE don't stay the same as children go into higher grades The SD increases as grades get higher (students in grade 10 will have a wider SD than students in grade 2) - because little kids don't know that much, so their ceilings are limited, but with maturation we see more individual differences - the GE change as children age The units are only at an ordinal level of measurement - another problem Therefore, not good for research purposes

Answer 9

Percentiles are: • Simple • Descriptive - meaning is understood easily, gives us some info What's wrong? • Almost never analyzed as test scores • Percentile units are NOT equal, they are only at the ordinal level of measurement - units are not constant • Original raw score would be better, or another type of score

Answer 10

• Procedures that replace the older equating procedures above (fixed reference, anchor tests, alternate and parallel forms) • Latent trait: the models seek to observe the unobservable qualities underlying behaviour • IRT apply models to test item data, not test data ○ *Can produce item parameter estimates that are invariant across populations • Can be used to: 1. Estimate the probability that ppl with specified levels of the ability/trait in question will answer an item correctly or in a certain way 2. Estimate the trait levels needed to have a specified probability of responding in a certain way

Answer 11

Analyzing the test taker's ability as they are responding to items, and selecting the next items to be shown depending on those results • Shortens test length • Reduce test taker's frustration when the test is not adapted to their abilities • Problems with security, cost, inability to change answers

Answer 12

* A test's name may not always indicate the test's content * When a test is revised, an edition number can be added, or its name can change * Giving the two versions of a test to the same group and comparing results indicate if the versions are interchangeable * Major revisions require re-standardization

Answer 13

Increase in the level of performance required to obtain the same score over 2 different versions of a test (means that the test is getting harder, to adjust for the population's better performance) ○ Does not mean that the people are becoming more intelligent - other factors may influence this ○ Creates debate: execution of convicts on the verge of mental retardation

Answer 14

When a person's performance has to be determined to have reached a certain level or not • Performance will be compared to pre-established criteria, and not the performance of others • Criterion: may refer to either knowledge of a specific domain or competence • Often, but not always, uses cutoff scores or score ranges 2 underlying sets of standards for those tests: 1. The amount of knowledge of a domain 2. The level of competence in a skill The criteria for competency or knowledge can be quantitative (a certain %) or more qualitative, or even on an all-or-none basis

Answer 15

Content- or domain-referenced tests There needs to be a very defined and clear field of subject from which to assess knowledge The selection of items and the definition of that field should be chosen by experts Requires a table of specifications: with cells that state the number of items/tasks to be included in the test for each learning objective

Answer 16

• Assess competence in tasks that are more realistic/complex/time-consuming than in content or domain-referenced tests • Assessing performance through displays of behaviours (work samples, etc) ○ Criterion = quality of the performance itself or of its product ○ Evaluation and Scoring in the Assessment of Performance § Relies + on subjective judgement than assessments of competence § Can also be objective (when quality = speed, or else) § Most assessments involve: □ Identifying/describing qualitative criteria for evaluating □ Developing a method for applying the criteria (rating scales, scoring rubrics)

Answer 17

When a test score is used to predict the future performance of the individual on a certain criterion • Expectancy tables: show the distribution of test scores for one or more groups of individuals, cross-tabulated against their criterion performance • Expectancy charts: used when criterion performance in a job/program/else can be classified as either successful or unsuccessful ○ Present the distribution of scores along with the % of people at each score interval who succeeded/failed in terms of the criterion

Answer 18

1. In norm-referenced-testing, the primary objective is to make distinctions among individuals/groups in terms of the ability/trait assessed 2. In criterion-referenced testing, the primary objective is to evaluate a person/group's degree of competence or mastery of a skill or knowledge domain in terms of a preestablished standard of performance Sometimes the same instrument can be used for both - but one often ends up being more evaluated than the other because the tests need to be constructed differently

Answer 19

• Term not used for personality assessments, since those can't be assessed with criteria • Cut-off scores can be used to establish if clinical criteria have been met for some disorders ○ Same use of criterion-referenced interpretation as when test scores are used to place someone in an educational/employment setting ○ Ex: Beck depression inventory

Answer 20

Item Response Theory methods | ○ Why - their goal is to estimate a test taker's position on a latent trait or ability dimension

Answer 21

○ No matter how poorly a student pop scores, half of them will be above average

Answer 22

when 2 diff test are administered to the same person | • Interpreting the score from those 2 tests = problem

Answer 23

Depends if the normative samples of each test are comparable

Answer 24

2 separate tests which normative samples overlap | ex: SB5 and BG VM II - the normative samples overlapped by about 75-80%

Ch3 - Test Score Interpretation Flashcards

(48 cards)