ch.4 for unit 3 Flashcards
Reliability and validity/ interrater reliability
same thing over time, even if it is wrong
-v= A test is considered valid for a particular purpose if it does, in fact, measure what it purports to measure.
interrater reliability is the degree to which different respondents give similar evaluations of a behavior or trait.
features of good test
reliablke
valid
A good test is one that trained examiners can administer, score, and interpret with a minimum of difficulty.
A good test is a useful test, one that yields actionable results that will ultimately benefit individual testtakers or society at large
3 guidelines of assessment for assessing child custody decisions
(1) the assessment of parenting capacity, (2) the assessment of psychological and developmental needs of the child, and (3) the assessment of the goodness of fit between the parent’s capacity and the child’s needs.
normative data,
norms provide a standard with which the results of measurement can be compared. Let’s explore the important subject of norms in a bit more detail.
norm-referenced testing and assessment
method of evaluation and a way of deriving meaning from test scores by evaluating an individual testtaker’s score and comparing it to scores of a group of testtakers. In this approach, the meaning of an individual test score is understood relative to other scores on the same test. A common goal of norm-referenced tests is to yield information on a testtaker’s standing or ranking relative to some comparison group of testtakers.
norms / normative sample
he test performance data of a particular group of testtakers that are designed for use as a reference when evaluating or interpreting individual test scores.
ns = A normative sample is that group of people whose performance on a particular test is analyzed for reference in evaluating the performance of individual testtakers.
norming,
refer to the process of deriving norms. Norming may be modified to describe a particular type of norm derivation. For example, race norming is the controversial practice of norming on the basis of race or ethnic background.
user norms or program norms
“consist of descriptive statistics based on a group of testtakers in a given period of time rather than norms obtained by formal sampling methods
test standardization
The process of administering a test to a representative sample of testtakers for the purpose of establishing norms/ a test is said to be standardized when it has clearly specified procedures for administration and scoring, typically including normative data
standard
a noun in the context of testing and assessment is in the title of that well-known manual that sets forth ideals of professional behavior against which any practitioner’s behavior can be judged: The Standards for Educational and Psychological Testing, usually referred to simply as the Standards.
standard as an adjective
As an adjective, standard often refers to what is usual, generally accepted, or commonly employed. One may speak, for example, of the standard way of conducting a particular measurement procedure, especially as a means of contrasting it to some newer or experimental measurement procedure.
why are test standardizationa nd test norming used interchangeably
Another and perhaps more typical use of standardization, however, is reserved for that part of the test development process during which norms are developed. It is for this very reason that the terms test standardization and test norming have been used interchangeably by many test professionals.
Standard error of measurement
A statistic used to estimate the extent to which an observed score deviates from a true score
Standard error of estimate
standard error of the estimate (SEE) measures the accuracy of predictions made by a regression model, indicating how much the dependent variable differs from the regression model’s predictions
Standard error of the mean
A measure of sampling error
Standard error of the difference
A statistic used to estimate how large a difference between two scores should be before the difference is considered statistically significant
sampling./ stratified sampling/ stratified random sampling
sampling= The process of selecting the portion of the universe deemed to be representative of the whole population
stratified sampling = In a stratified sample, researchers divide a population into homogeneous subpopulations called strata (the plural of stratum) based on specific characteristics (e.g., race, gender identity, location, etc.). Every member of the population studied should be in exactly one stratum.
Each stratum is then sampled using another probability sampling method, such as cluster sampling or simple random sampling, allowing researchers to estimate statistical measures for each sub-population.
Researchers rely on stratified sampling when a population’s characteristics are diverse and they want to ensure that every characteristic is properly represented in the sample.
stratified-random sampling.= If such sampling were random (or, if every member of the population had the same chance of being included in the sample), then the procedure would be termed
purposive sampling and incidental sampling.
If we arbitrarily select some sample because we believe it to be representative of the population, then we have selected what is referred to as a purposive sample
An incidental sample or convenience sample is one that is convenient or available for use. You may have been a party to incidental sampling if you have ever been placed in a subject pool for experimentation with introductory psychology students
when can normative sample and standardizaTION sample be used interchangeably?
When the people in the normative sample are the same people on whom the test was standardized, the phrases normative sample and standardization sample are often used interchangeably.
Percentile norms
raw data from a test’s standardization sample converted to percentile form.
Thus, the 15th percentile is the score at or below which 15% of the scores in the distribution fall. The 99th percentile is the score at or below which 99% of the scores in the distribution fall.
a percentile is an expression of the percentage of people whose score on a test or measure falls below a particular raw score.
Percentage correct
refers to the distribution of raw scores—more specifically, to the number of items that were answered correctly multiplied by 100 and divided by the total number of items.
age-equivalent scores,/ age norms
age-equivalent scores, age norms indicate the average performance of different samples of testtakers who were at various ages at the time the test was administered. If the measurement under consideration is height in inches, for example, then we know that scores (heights) for children will gradually increase at various rates as a function of age up to the middle to late teens
grade norms
are developed by administering the test to representative samples of children over a range of consecutive grade levels (such as first through sixth grades). Next, the mean or median score for children at each grade level is calculated.
=One drawback of grade norms is that they are useful only with respect to years and months of schooling completed. They have little or no applicability to children who are not yet in school or to children who are out of school. Further, they are not typically designed for use with adults who have returned to school.
developmental norms,
a term applied broadly to norms developed on the basis of any trait, ability, skill, or other characteristic that is presumed to develop, deteriorate, or otherwise be affected by chronological age, school grade, or stage of life.