Chapter 4: Of Tests and Testing Flashcards
Assumptions about Psychological Testing and Assessment
Psychological Traits and States Exist
Psychological Traits and States can be Quantified and Measured
Test-Related Behavior Predict Non-Test-Related Behavior
Tests and Other Measurement Techniques Have Strengths and Weaknesses
Various Sources of Error are Part of the Assessment Process
Testing and Assessment Can be Conducted in a Fair and Unbiased Manner
Testing and Assessment can Benefit Society
Any distingishable, relatively enduring way in which one individual varies from one another
Distinguish one person from another but are relatively less enduring;
Psychological Trait
Examples are traits that relate to intelligence, specific intellectual abilities, cognitive style, adjustment, interests, attitudes, sexual orientation and preferences, psychopathology, personality in general and specific personality traits
Referring to an absence of primacy of male or female characteristic
Freed from constraints of gender-dependent social expectations
New Age
Refers to a particular nonmainstream orientation to spirituality and health
An informed, scientific concept developed or constructed to describe or explain behavior; cannot be seen, heard or touched but existence can be inferred from overt behavior
Overt Behavior
Refers to an observable action or the product of an observable action, including test- or assessment-related responses
Relatively Enduring
Reminder that a trait is not expected to be manifested in behavior 100% of the time; important to be aware of the context or situation in which a particular behavior is displayed
Definition of Trait and State
Refer to a way in which one individual varies from another;
Reference Group
Can greatly influence one’s conclusions or judgments
Weighing a Comparative Value of a Test’s Items
Comes about as a result of a complex interplay among many factors, including technical considerations, the way a construct has been defined for the purposes of the test, and the value society (and the test developer) attaches to the behaviors evaluated
Test Score
Presumed to represent the strength of the targeted ability or trait or state and is frequently based on cumulative scoring
Domain Sampling
Refer to either a sample of behaviors from all possible behaviors that could conceivably be indicative of a particular construct or a sample of test items from all possible items that could conceivably be used to measure a particular construct
Forensic Matters
Psychological tests may be used to postdict behavior
To aid in the understanding of behavior that has already taken place
Competent Test Users
Understand how a test was developed, the circumstances under which it is appropriate to administer the test, how the test should be administered and to whom, how the test results should be interpreted; understand and appreciate the limitations of the tests they use as well as how those limitations might be compensated for by data from other sources
Error in Assessment
Something that is more than expected; actually a component of measurement process; refers to a long-standing assumtion that factors other than what a test attempts to measure will influence performance on the test
Error Variance
The component of a test score attribtable to sources other than the trait or ability measured
Sources of Error Variance
Assessess themselves; Assessors, Measuring Instruments
Classical or True Score Theory of Measurement
EAch testtaker has a true score on a test that would be obtained but for the random action of measurement error
Characteristics of a Good Test
Reliability and Validity
Involves consistency of the measuring tool: the precision with which the test measures and the extent to which error is present in measurements; the perfectly reliable measuring tool consistently measures in the same way; it yields the same numerical measurement every timeit measures the same thing under the same conditions
It measures what it’s supposed to measure; focuses on items that collectively make up the test;
Provide a standard with which the results of a measure ment can be compared
Norm-Referenced TEsting and Assessment
A method of evaluation and a way of deriving meaning from test scores by evaluating an individual testtaker’s score and comparing it to scores of a group of testtakers; common goal is to yield information on a testtaker’s standing or ranking relative to some comparison group of testtakers
Refers to behavior that is usual, average, normal, standard, expected, or typical
Norms in Psychometric Context
Test performance data of a particular group of testtakers that are designed for use as a reference when evaluating or interpreting individual test scores
Normative sample
Group of people whose performance on a particular test is analyzed for reference in evaluating the performance of individual testtakers
Refers to the process of deriving norms; may be modified to describe a particular type of norm derivation
User/Program Norms
Consist of descriptive statistics based on a group of testtakers in a given period of time rather than norms obtained by formal sampling methods
Test Standardization/Standardization
Process of administering a test to a representative sample of testtakers for the purpose of establishing norms
Standardized Test
Has clearly specified procedures for administration and scoring, typically includes normative data
Targetting some defined group as the population for which the test is designed
Types of Norms
Age Norms Grade Norms National Norms National Anchor Norms Local Norms Norms from a Fixed Reference Group Subgroup Norms Percentile Norms
Expression of the percentage of people whose score on a test or measure falls below a particular raw score; popular way of organizing all test-related data, including standardization sample data
Percentage Correct
Refers to the distribution of raw scores-more specifically, to the number of items that were answered correctly multiplied by 100 and divided by the total number of items
Age-Equivalent Scores/Age Norms
Indicate the average performance of different samples of testtakers who were at various ages at the time the test was administered
Grade Norms
Developed by administering the test to representative samples of children over a range of consecutive grade levels; the mean or median for each level is calculated; Do not provide information as to the content type of items that a student could or could not answer correctly
Developmental Norms
A term applied broadly to norms developed on the basis of any trait, ability, skill, or other characteristic that is presumed to develop, deteriorate, or otherwise be affected by chonological age, school grade, or stage of life
National Norms
Derived from a normative sample that was nationally representative of the population at the time the norming study was conducted; May be obtained by testing large numbers of people representative of different variable of interest such as age, gender, racial/ethnic background, socioeconomic strata, geographical location, and different types of communities within the various parts of the country
National Anchor Norms
Provide norms provide some stability to test scores by anchoring them to other test scores
Equipercentile Method
Method by which such equivalency tables or national anchor norms are established which begins with the computation of percentile norms for each of the tests to be compared
Subgroup Norms
Segmentation of a normative sample by any of the criteria initially used in selecting subjects for the sample
Local Norms
Provide normative information with respect to the local population’s performance on some test
Fixed Reference Group Scoring System
Distribution of scores obtained on the test from one group of testtakers (fixed reference group) is used as the basis for the calculation of test scores for future administrations of the test
A procedure that permits the conversion of raw scores on the new version of the test into fixed reference group scores
Norm-Referenced Scores
Approach to evaluation which seeks to derive meaning from a test score by evaluating the test score in relation to other scores on the swame test
Standard on which a judgment or decision may be based
Criterion-Referenced Testing and Assessment
Defined as a method of evaluation and a way of deriving meaning from test scores by evaluating an individual’s score with reference to a set standard; focus is on how scores relate to a particular content area or domain
Correlation Coefficient
Number that provides us with an index of the strength of the relationship between two things;
Expression of the degree and direction of correspondence between two things; does not illustrate a causal relationship but there is an implication of prediction; if we know that there is a high correlation between X and Y, then we should be able to predict with various degrees
Coefficient of Correlation (r)
Expresses a linear relationship between two (and only two) variables, usually continuous in nature; reflects the degree of concomitant variation between variable X and variable Y; numerical index that expresses this relationship; tells us the extent to which X and Y are correlated; interpreted by sign and magnitude
Positive Correlation
Negative Correlation
Magnitude of Correlation Coefficient
udged by its absolute value; the extent can be as low as -1 to as high as +1; this would mean that the correlation is perfect, without error in the statistical sense
Positive Correlation
When two variables simultaneously increase or simultaneously decrease
Negative (Inverse) Correlation
When one variable increases while the other variable decreases
Zero Correlation
No relationship exists between the two variables
Pearson Correlation Coefficient/Pearson Product-Moment Coefficient of Correlation/Pearson r
Devised by Karl earson; r can be the statistical tool of choice when the realtionship between the variables is linear and when the two variables being correlated are continuous (That they can take any value); formula takes into account the relative position of each test score or measurement with respect to the mean of the distribution
Pearson r Computation
If the negative standard score values for measurements of X always corresponded wth negative standard score values for Y scores, the resulting r would be positive (multiplying two negative values will result in a positive number); if positive standard score values on X always corresponded with negative standard score values for Y and vice versa, then an inverse relationship would exist and so a negative correlation would result; should only be used when the relationship between the variables is linear
Zero or Near-Zero Correlation
Could result when some products are positive and some are negative
What to do with Pearson r
Ask Is this number statistically significant given the size and nature of the sample?
Ask Could this result have occured by chance?
Significance at the .01 level tells you, with reference to these data, that a correlation such as this could have been expected to occur merely by chance only one time or less in a hundred if X and Y are not correlated in the population
Significance Levels
.05 provides the basis for concluding that a correlation does indeed exist; means that the result could have een expected to occur by chance along five times or less in a hundred
Coefficient of Determination (r2)
Indication of how much variance is shared by the X- and the Y-variables; The remaining variance of the r2 (1-r2) could presumably be accounted for by chance, error, or otherwise unmeasured or unexplainable factors
Describes a deviation about a mean of a distribution
Individual deviations about the mean of a distribution ; first moments of the distribution
Moments Squared
Second moments of the distribution
Moments Cubed
Third moments of the distribution
Spearman’s Rho/Rank-Order Correlation Coefficient/Rank-Difference Correlation Coefficient
Developed by Charles Spearman; frequently used when sample size is small (fewer than 30 pairs of measurements) and especially when both sets of measurements are in ordinal (or rank-order) form; special tables are used to determine if an obtained rho coefficient is or is not significant
Graphic Representations of Correlation
Bivariate Distribution
Scatter Diagram
Simple, graphing of the coordinate points for values of the X-variable (horizontal axis) and the Y-variable (vertical axis); provide a quick indication of the direction and magnitude of the relationship, if any, between the two variables
Direction of the Curve
Helps distinguish positive from negative correlations
Degree to Which the Points form a Straight Line
Helps estimate the strength of magnitude of the correlation
Eyeball gauge of how curved a graph is
Extremely atypical point located at a relatively long distance-an outlying distance-from the rest of the coordinate points in a scatterplot; stimulate interpreters of test data to speculate about the reason for the atypical score; can provide a hint of some deficiency in the testing or scoring procedures
The analysis of relationships among variables for the purpose of understanding how one variable may predict another
Simple Regression
Involves one independent variable (X), referred to as the predictor variable; and one dependent variable (Y), referred to as the outcome variable
Regression Line
Line of Best Fit; the straight line that comes closes to the greatest number of points on the scatterplot of X and Y
Regression coefficients
b = slope of the line a = intercept (constant which indicates where the line crosses the Y-axis
Standard Error of the Estimate
Error in the prediction of Y from X; the higher the correlation between X and Y, the greater the accuracy of the prediction and the smaller the standard error of the estimate
Multiple Regression
Takes into account the intercorrelations among all the variables involved; corelation between each of the predictor scores and what is being predicted is reflected in the weight given to each predictor; Predictors that correlate highly with the predicted variable are generally given more weight
Analysis of data from several studies; refers to a family of techniques used to statistically combine information across studies to produce single estimates of the statistics being studied; more weight can be given to studies that have larger numbers of subjects