ch.4 for unit 3 Flashcards by Elise V.

Reliability and validity/ interrater reliability

same thing over time, even if it is wrong
-v= A test is considered valid for a particular purpose if it does, in fact, measure what it purports to measure.
interrater reliability is the degree to which different respondents give similar evaluations of a behavior or trait.

How well did you know this?

Not at all

Perfectly

features of good test

reliablke
valid
A good test is one that trained examiners can administer, score, and interpret with a minimum of difficulty.
A good test is a useful test, one that yields actionable results that will ultimately benefit individual testtakers or society at large

How well did you know this?

Not at all

Perfectly

3 guidelines of assessment for assessing child custody decisions

(1) the assessment of parenting capacity, (2) the assessment of psychological and developmental needs of the child, and (3) the assessment of the goodness of fit between the parent’s capacity and the child’s needs.

How well did you know this?

Not at all

Perfectly

normative data,

norms provide a standard with which the results of measurement can be compared. Let’s explore the important subject of norms in a bit more detail.

How well did you know this?

Not at all

Perfectly

norm-referenced testing and assessment

method of evaluation and a way of deriving meaning from test scores by evaluating an individual testtaker’s score and comparing it to scores of a group of testtakers. In this approach, the meaning of an individual test score is understood relative to other scores on the same test. A common goal of norm-referenced tests is to yield information on a testtaker’s standing or ranking relative to some comparison group of testtakers.

How well did you know this?

Not at all

Perfectly

norms / normative sample

he test performance data of a particular group of testtakers that are designed for use as a reference when evaluating or interpreting individual test scores.
ns = A normative sample is that group of people whose performance on a particular test is analyzed for reference in evaluating the performance of individual testtakers.

How well did you know this?

Not at all

Perfectly

norming,

refer to the process of deriving norms. Norming may be modified to describe a particular type of norm derivation. For example, race norming is the controversial practice of norming on the basis of race or ethnic background.

How well did you know this?

Not at all

Perfectly

user norms or program norms

“consist of descriptive statistics based on a group of testtakers in a given period of time rather than norms obtained by formal sampling methods

How well did you know this?

Not at all

Perfectly

test standardization

The process of administering a test to a representative sample of testtakers for the purpose of establishing norms/ a test is said to be standardized when it has clearly specified procedures for administration and scoring, typically including normative data

How well did you know this?

Not at all

Perfectly

standard

a noun in the context of testing and assessment is in the title of that well-known manual that sets forth ideals of professional behavior against which any practitioner’s behavior can be judged: The Standards for Educational and Psychological Testing, usually referred to simply as the Standards.

How well did you know this?

Not at all

Perfectly

standard as an adjective

As an adjective, standard often refers to what is usual, generally accepted, or commonly employed. One may speak, for example, of the standard way of conducting a particular measurement procedure, especially as a means of contrasting it to some newer or experimental measurement procedure.

How well did you know this?

Not at all

Perfectly

why are test standardizationa nd test norming used interchangeably

Another and perhaps more typical use of standardization, however, is reserved for that part of the test development process during which norms are developed. It is for this very reason that the terms test standardization and test norming have been used interchangeably by many test professionals.

How well did you know this?

Not at all

Perfectly

Standard error of measurement

A statistic used to estimate the extent to which an observed score deviates from a true score

How well did you know this?

Not at all

Perfectly

Standard error of estimate

standard error of the estimate (SEE) measures the accuracy of predictions made by a regression model, indicating how much the dependent variable differs from the regression model’s predictions

How well did you know this?

Not at all

Perfectly

Standard error of the mean

A measure of sampling error

How well did you know this?

Not at all

Perfectly

Standard error of the difference

A statistic used to estimate how large a difference between two scores should be before the difference is considered statistically significant

sampling./ stratified sampling/ stratified random sampling

sampling= The process of selecting the portion of the universe deemed to be representative of the whole population
stratified sampling = In a stratified sample, researchers divide a population into homogeneous subpopulations called strata (the plural of stratum) based on specific characteristics (e.g., race, gender identity, location, etc.). Every member of the population studied should be in exactly one stratum.

Each stratum is then sampled using another probability sampling method, such as cluster sampling or simple random sampling, allowing researchers to estimate statistical measures for each sub-population.

Researchers rely on stratified sampling when a population’s characteristics are diverse and they want to ensure that every characteristic is properly represented in the sample.

stratified-random sampling.= If such sampling were random (or, if every member of the population had the same chance of being included in the sample), then the procedure would be termed

purposive sampling and incidental sampling.

If we arbitrarily select some sample because we believe it to be representative of the population, then we have selected what is referred to as a purposive sample
An incidental sample or convenience sample is one that is convenient or available for use. You may have been a party to incidental sampling if you have ever been placed in a subject pool for experimentation with introductory psychology students

when can normative sample and standardizaTION sample be used interchangeably?

When the people in the normative sample are the same people on whom the test was standardized, the phrases normative sample and standardization sample are often used interchangeably.

Percentile norms

raw data from a test’s standardization sample converted to percentile form.
Thus, the 15th percentile is the score at or below which 15% of the scores in the distribution fall. The 99th percentile is the score at or below which 99% of the scores in the distribution fall.
a percentile is an expression of the percentage of people whose score on a test or measure falls below a particular raw score.

Percentage correct

refers to the distribution of raw scores—more specifically, to the number of items that were answered correctly multiplied by 100 and divided by the total number of items.

age-equivalent scores,/ age norms

age-equivalent scores, age norms indicate the average performance of different samples of testtakers who were at various ages at the time the test was administered. If the measurement under consideration is height in inches, for example, then we know that scores (heights) for children will gradually increase at various rates as a function of age up to the middle to late teens

grade norms

are developed by administering the test to representative samples of children over a range of consecutive grade levels (such as first through sixth grades). Next, the mean or median score for children at each grade level is calculated.
=One drawback of grade norms is that they are useful only with respect to years and months of schooling completed. They have little or no applicability to children who are not yet in school or to children who are out of school. Further, they are not typically designed for use with adults who have returned to school.

developmental norms,

a term applied broadly to norms developed on the basis of any trait, ability, skill, or other characteristic that is presumed to develop, deteriorate, or otherwise be affected by chronological age, school grade, or stage of life.

national norms

are derived from a normative sample that was nationally representative of the population at the time the norming study was conducted.

national anchor norms

could provide the tool for such a comparison. Just as an anchor provides some stability to a vessel, so national anchor norms provide some stability to test scores by anchoring them to other test scores.

equipercentile method,

the equivalency of scores on different tests is calculated with reference to corresponding percentile scores. Thus, if the 96th percentile corresponds to a score of 69 on the BRT and if the 96th percentile corresponds to a score of 14 on the XYZ, then we can say that a BRT score of 69 is equivalent to an XYZ score of 14.

subgroup norms.

In psychological testing, subgroup norms involve analyzing the performance of specific groups within a larger sample, like age or gender groups, to understand differences and provide more tailored interpretations of test resultsWhat results from such segmentation are more narrowly defined subgroup norms. Thus, for example, suppose criteria used in selecting children for inclusion in the XYZ Reading Test normative sample were age, educational level, socioeconomic level, geographic region, community type, and handedness (whether the child was right-handed or left-handed).

local norms

provide normative information with respect to the local population’s performance on some test.

fixed reference group scoring system.

Here, the distribution of scores obtained on the test from one group of testtakers—referred to as the fixed reference group—is used as the basis for the calculation of test scores for future administrations of the test.

anchoring/ national anchor norms

. Suppose we select a reading test designed for use in grades 3 to 6, which, for the purposes of this hypothetical example, we call the Best Reading Test (BRT). Suppose further that we want to compare findings obtained on another national reading test designed for use with grades 3 to 6, the hypothetical XYZ Reading Test, with the BRT. An equivalency table for scores on the two tests, or national anchor norms, could provide the tool for such a comparison. Just as an anchor provides some stability to a vessel, so national anchor norms provide some stability to test scores by anchoring them to other test scores.

Criterion-referenced testing and assessment

may be defined as a method of evaluation and a way of deriving meaning from test scores by evaluating an individual’s score with reference to a set standard =To be eligible for a high-school diploma, students must demonstrate at least a sixth-grade reading level.

domain- or content-referenced testing and assessment.6

other names for criterion referenced testing/assessment Because the focus in the criterion-referenced approach is on how scores relate to a particular content area or domain

diff between norm-referenced and criterion-referenced approaches to assessment

area of focus regarding test results. In norm-referenced interpretations of test data, a usual area of focus is how an individual performed relative to other people who took the test. In criterion-referenced interpretations of test data, a usual area of focus is the testtaker’s performance: what the testtaker can or cannot do; what the testtaker has or has not learned; whether the testtaker does or does not meet specified criteria for inclusion in some group, access to certain privileges, and so forth. Because criterion-referenced tests are frequently used to gauge achievement or mastery, they are sometimes referred to as mastery tests. T

how are national anchor norms created?

Using the equipercentile method, the equivalency of scores on different tests is calculated with reference to corresponding percentile scores. ANCHOR NORMS OF NEW TEST TO PREVIOUS ALREADY VALIDATED TEST = MAKES THE NEW TEST MORE VALID AND RELIABLE