Assessment and Statistics Flashcards
All key terms from the Assessment in Counselling textbook.
Accessibility
The notion that all examinees should have an unobstructed opportunity to demonstrate their capabilities on the construct(s) being assessed.
Accommodations
An action taken in response to an individual with a disability in which there is a departure from the standard testing protocol in an attempt to adjust for the disability.
Acculturation
A process of change and adaption that individuals undergo as a result of contact with another culture.
Achievement test
An assessment in which the person has “achieved” knowledge, information, or skills through instruction, training, or experience. Achievement tests measure acquired knowledge and do not make any predictions about the future
Adaptation
The change in original instrument in terms of design or administration to increase the accessibility to individuals (e.g., visually impaired, limited English proficiency).
Affective instrument
An instrument that assesses interest, attitudes, values, motives, temperaments, and the noncognitive aspects of personality.
Age or grade equivalent scores
Scores used to compare individuals with other individuals at the same age that are calculated by item response theory or by using a norm-referenced approach
Alternate or parallel forms
Two forms of an instrument that can be correlated, resulting in an estimate of reliability.
Analogue observation
In this type of observation, the counselor creates a simulated environment that is reflective of the client’s natural environment
Appraisal
Another term for assessment.
Aptitude test
A test that provides a prediction about the individual’s future performance or ability to learn based on his or her performance on the test. Aptitude tests often predict either future academic or vocational/ career performance.
Assessment
A procedure for gathering client information that is used to facilitate clinical decisions, provide clients with information, or for evaluative purposes.
Authentic Assessment
Performance assessments that involve the performance of “real” or authentic applications rather than proxies or estimators of actual learning.
Behavioral Assessment
An assessment method in which the focus is typically on observing and recording of the precise behaviors of the client
Bias Testing
A term that refers to the degree to which construct-irrelevant factors systematically affect a specific group’s performance.
Cluster Sampling
A technique that involves using existing units or cluster rather than selecting individuals.
Coaching
It involves longer training or practice on questions that are the same or similar to the items on the test.
Coefficient of Determination
This statistic estimates the percent of shared variance between two sets of variables that have been correlated. The coefficient of determination (r2) is calculated by squaring the correlation coefficient.
Computer Adaptive Testing (CAT)
A testing in which the computer adapts the next question for the student based on his or her response
Concurrent Validity
A type of validation evidence in which there is no delay between the time the instrument is administered and the time the criterion information is gathered.
Conditional Standard Errors of Measurement
Type of standard error of measurement that takes into account the different score levels.
Construct Underrepresentation
The degree to which the instrument is unable to capture significant aspects of the construct.
Construct Irrelevance
The degree to which scores or results are affected by that are extraneous to the instrument’s intended purpose.
Construct Validity
One of the three traditional forms of validity that is broader than either content or criterion-related validity. Many experts in assessment now argue that evidence of construct validity, which includes the other traditional forms of validity, applies in all types of psychological and educational assessment. This type of validation involves the gradual accumulation of evidence. Evidence of construct validity is concerned with the extent to which the instrument measures some psychological trait or construct and how the results can be interpreted.
Content-Related Validity
One of the three traditional forms of validity in which the focus was on whether the instrument’s content adequately represented the domain being assessed. Evidence of content-related validity is often reflected in the steps the authors used in developing the instrument
Convergent Evidence
Validation evidence that indicates the measure is positively related with other measures of construct
Correlation Coefficient
A statistic that provides an indication of the degree to which two sets of scores are related. A correlation coefficient (r) can range from —1.00 to +1.00 and, thus, provides an indicator of both the strength and direction of the relationship. A correlation of +1.00 represents a perfect positive relationship; a correlation of —1.00 represents a perfect negative or inverse relationship. A correlation coefficient of .00 indicates the absence of a relationship.
Correlation Method
A statistical tool often used in providing validation evidence related to an instrument’s relationship with other variables.
Criterion-Referenced Instrument
Instruments designed to compare an individual’s performance to a stated criterion or standard. Often criterion-referenced instruments provide information on specific knowledge or skills and on whether the individual has “mastered” that knowledge or skill. The focus is on what the person knows rather than how he or she compares to other people
Criterion-Related Validity
One of the three traditional types of validity in which the focus is the extent to which the instrument confirms (concurrent validity) or predicts to (predictive validity) a criterion measure.
Cronbach’s Alpha or Coefficient Alpha
It is one of the methods of estimating reliability through the examination of the internal consistency of the instrument. This method is appropriate when the instrument is not dichotomously scored, such
as an instrument that uses a Likert scale.
Decision Theory
A method that examines the relationship between an instrument and a criterion or predictor variable, which usually involves an expectancy table. Expectancy tables frequently are used to determine cutoff scores or to provide clients with information regarding the probability of a certain performance on the criterion that is based on scores on the assessment.
Differential Item Functioning (DIF)
A set of statistical methods for investigating item bias that examines differences in performance among individuals who are equal in ability but are from different groups (e.g., different ethnic groups).
Discriminant Evidence
Validation evidence that indicates the measure is not related to measures of different psychological constructs.
Domain Sampling Theory
Another term for generalizability theory.
Duty to Warn
The requirement or permission for mental health practitioners to disclose information about a client when that client is going to harm someone else.
Event Recording
One of the methods used in behavioral assessment, where the counselor records the number of times a target behavior or behaviors occur during a specified time period.
Expectancy Table
A method of providing validity evidence that involves charting performance on the criterion based on the instrument’s score. It is often used to predict who would be expected to fall in a certain criterion category (e.g., who is likely to succeed in graduate school) and to determine cutoff scores.
Factor Analysis
A term that covers various statistical techniques that are used to study the patterns of relationship among variables with the goal of explaining the common underlying dimensions (factors). In assessment, factor analysis is often used to examine if the intended internal structure of an instrument is reflected mathematically. For example, a researcher would analyze whether all items on each subscale “load” with the other items on the appropriate subscale (factor) and not with another factor.
False Negative
In decision theory, a term used to describe when the assessment procedure is incorrect in predicting a negative outcome on the criterion.
False Positive
In decision theory, a term used to describe when the assessment procedure is incorrect in predicting a positive outcome on the criterion.
Formative Evaluation
A continuous or intermediate evaluation typically performed to examine the counseling services process
Frequency Distribution
A chart that summarizes the scores on an instrument and the frequency or number of people receiving that score. Scores are often grouped into intervals to provide an easy-to-understand chart that summarizes overall performance.
Frequency Polygon
A graphic representation of the frequency of scores. The number or frequency of individuals receiving a score or falling within an interval of scores is plotted with points that are connected by straight lines
General Ability Test
Another term for intelligence test.
Generalization Theory
An alternative model to the true score model of reliability. The focus of this theory is on estimating the extent to which specific sources of variation under defined conditions influence scores on an instrument.
Grade Equivalent Norms
Norms that are typically used in achievement tests and provide scores in terms of grade equivalents. In some instruments, grade equiv- alent scores are not validated on each specific grade but are extrapolated scores based on group performance at each grade level
High-Stales Testing
A type of testing where the outcome of such tests have significant consequences (e.g., high school graduation examination),
Histogram
A graphic representation of the frequency of scores in which columns are utilized
Hypomanic Episode
Characterized by similar symptoms as a manic episode except it lasts for only four days
Individualized Education Plan (IEP)
An educational plan that is developed for each student who is receiving special education and related services. The plan is developed by a team of educators and the child’s parents or guardians.
Instrument
An assessment tool that typically is not related to grading. In this book, instruments include tests, scales, checklists, and inventories.
Intelligence Tests
Instruments that are designed to measure the mental capabilities of an individual. These assessments are also referred to as general ability tests.
Intercepts
In regression, it is the constant in the regression equation or it is where the line crosses the y-axis.
Interrater Reliability
A measure used to examine how consistently different raters evaluate the answers to the items on the instrument.
Interval Recording
An assessment method that focuses on whether specific behavior(s) occur within a certain interval of time. It is also referred to as time sampling, interval sampling, or interval time sampling
Interval Scale
A type of measurement scale in which the units are in equal intervals. Many of the statistics used to evaluate an instrument’s psychometric qualities require an interval scale.
Item Analysis
An analysis that focuses on examining and evaluating each item within assessment.
Item Difficulty
An item analysis method in which the difficulty of individual items is determined. The most common item difficulty index (p) is the percentage of people who get the item correct.
Item Descrimination
A form of item analysis that examines the degree to which an individual item dis- criminates on some criterion. For example, in achievement testing, item discrimination would indicate whether the item discriminates between people who know the information and people who do not
Item Response Theory (IRT)
A measurement approach in which the focus is on each item and on establishing items that measure the individual’s ability or level of a latent trait. This approach involves examining the item characteristic function and the calibration of each individual item.
Kuder-Richardson Formulas
Two formulas (KR 20 and KR 21) that were developed to estimate reliability. Both of these methods are measures of internal consistency. KR 20 has been shown to approximate the aver- age of all possible split-half coefficients. KR 21 is easier to compute, but the items on the instrument must be homogeneous.
Latent Trait Theory
Another term for item response theory
Major Depressive Episode
A period in which the client is in a depressed mood or has lost interest or pleasure in nearly all activities for at least two weeks.
Manic Episode
A distinct period of abnormally elevated, expansive, or irritable moods and abnormally increased goal-directed activity or energy that lasts at least one week and is present most of the day. It is characterized by a feeling of becoming so driven that it causes marked impairments in occupational functioning, social activities, or relationships.
Mean
The arithmetic average of the scores. It is calculated by adding scores together and dividing by the number in the group.
Median
The middle score, with 50% of the scores falling below it and 50% of the scores falling above it.
Mental Disorder
A syndrome characterized by clinically significant disturbance in an individual’s cognition, emotion regulation, or behavior that reflects a dysfunction in the psychological, biological, or developmental processes underlying mental functioning.
Mental Measurement Yearbooks (MMY)
A series of yearbook that contain critiques of many of the commercially available psychological, educational, and career instruments.
Mental Status Examination
An examination used to describe a client’s level of functioning and self-pre- sentation. It is generally conducted during the initial session or intake interview and is a statement of how a person appears, functions, and behaves during the initial session
Mode
The most frequent score in a distribution.
Modification
The changes being made that influence the construct being measured, and, therefore, the scores do not retain the same meaning as the original instrument.
Multitrait-Multimethod Matrix
A matrix that includes information on correlations between the measure and traits that it should be related to and traits that it should not theoretically be related to. The matrix also includes correlations between the measure of interest and other same-methods measures and measures that use different assessment methods
Narrative Recordings
Recordings that typically do not involve any quantitative recording procedures. They can be completed by the counselor, parents, other family members, teachers, or the client.
Naturalistic Observation
A type of observation in which the counselor observes the client in a typical environment and the counselor does not manipulate any aspect of the environment during the observation.
Negatively Skewed Distribution
A type of distribution where the majority of scores are on the higher end of the distribution.
Neuropsychological Assessment
An assessment of cognitive impairments, specifically the behavioral effects of possible brain injury or damage
Nominal Scale
A scale of measurement characterized by assigning numbers to name or representing mutually exclusive groups (e.g., 1 = male, 2 = female).
Normal Curve
A bell-shaped, symmetrical, and unimodal curve. The majority of cases are concentrated close to the mean, with 68% of the individual scores falling between one standard deviation below the mean and one standard deviation above the mean.
Normal Distribution
A distribution of scores with certain specific characteristics (e.g., 68% of the sample approximately falls between one standard deviation be- low the mean to one standard deviation above the mean
Norm-Reference Instruments
Instruments in which the interpretation of performance is based on the comparison of an individual’s performance with that of a specified group of people.
Observation
The most common method counselors use to assess personality in which they observe clients from the first meeting and begin to make clinical judgments based on those initial observations
Ordinal Scale
Type of measurement scale in which the degree of magnitude is indicated by the rank ordering of the data.
Panic Attack
An abrupt surge of fear or intense discomfort that reaches its peak quickly, and clients experience symptoms such as palpitations, sweating, trembling, choking, chest pain, nausea, and fear of losing control.
Percentile Rank or Percentile Scores
A ranking that provides an indication of the percent of scores that fall at or below a given score. For example: “Mary’s percentile of 68 means that if there were 100 people who had taken this instrument, 68 of them would have a score at or below Mary’s.”
Performance Assessments
An alternate method of assessing individuals, other than through multiple-choice types of items, in which the focus is on evaluation the performance of tasks or activities.
Performance Tests
Tests that require the manipulation of objects with minimal verbal influences.
Positively Skewed Distribution
A type of distribution in which the majority of scores are at the lower end of the range of scores. (Tail to the right).
Predictive Validity
A type of validation evidence in which there is a delay between the time the instrument is administered and the time the criterion information is gathered.
Projective Techniques
A type of personality assessment that provides the client with a relatively ambiguous stimulus, thus encouraging a nonstructured response. The assumption underlying these techniques is that the individual will project his or her personality into the response. The interpretation of projective techniques is subjective and requires extensive training in the technique.
Psychological Report
A summary of a client’s assessment results that is often geared toward other professionals. Frequently written by a psychologist, a typical report includes background information, behavioral observations, test results and interpretations, recommendations, and a summary.
Psychological Test
An objective and standardized measure of a sample of behavior.
Psychological Interview
A detailed interview that gathers background information and information about the client’s current psychological and social situation.
Qualitative Data
Descriptive information sought from the evaluation study, with the intent to produce “rich” interpretative data.
Quantitative Data
Information that is more numerical in nature where the intent is to quantify the results.
Randomized Clinical Trials Design
The gold standard in intervention research. Adopted from the medical model, where patients are randomly assigned to receive the medication or the placebo, in counseling evaluation studies, clients are randomly assigned to either the intervention group or the placebo/control group.
Ratings Recording
A category of behavioral assessment in that rating scales are completed, but often by either the counselor or some other observer (e.g., parents or teachers).
Ratio Scale
A scale of measurement that has both interval data and a meaningful zero (e.g., weight, height). Because ratio scales have a meaningful zero, ratio interpretations can be made.
Raw Scales
Raw scores are the unadjusted scores on an instrument before they are transformed into standard scores. An example of a raw score is the number of answers an individual gets correct on an achievement test.
Reactivity
The possible change that may occur in clients’ behavior, thoughts, or performance that are a result of being observed, assessed, or evaluated.
Regression
A commonly used statistical technique in which the researcher examines whether independent variables predict to a criterion or dependent variable. Regression is used to determine if there is a linear relationship among the variables or a line of best fit
Regression Equation
An equation that describes the linear relationship between the predictor variable(s) and the criterion variable. These equations are often used to determine if it is possible to predict the criterion based on the instrument’s scores.
Reliability
Concerns the degree to which a measure or a score is free of random error. In classical test theory, it is the ratio of true variance to observed variance.
Reliability Generalization
A meta-analytic method that combines estimates of reliability across studies in order to calculate an estimate based on multiple indicators of reliability.
Response-to-intervention
The replacement for the discrepancy approach of diagnosing a learning disability, the focus of which is to use data (e.g., achievement tests, classroom activities) to identify students at risk for poor learning outcomes
Score
A number or letter that is the product of a client taking an assessment. A score cannot be interpreted without additional information about the assessment.
Self-Monitoring
The practice of observing and recording one’s own behavior.
Semistructured Interview
An interview that is a combination of a structured and unstructured format in which there are a set of established questions and the clinician can also ask additional questions for elaboration or to gather additional information
Sequential Processing
The use of mental abilities to arrange stimuli in sequential, or serial order in order to process the information.
Simple Random Sample
A type of sample in which every individual in the population has an equal chance of being selected.
Simultaneous Processing
The use of mental abilities to integrate information in a unified manner, with the individual integrating fragments of information in or- der to comprehend the whole.
Skewed
The distribution is not symmetrical and the majority of people either scored in the low range or the high range, as compared with a normal distribution in which the majority scored in the middle.
Skewed Distributions
Distributions in which the majority of scores are either high or low. Skewed distributions are asymmetrical, and the mean, mode, and median are different. In positively skewed distributions, the majority of scores are on the lower end of the distribution; in negatively skewed distributions, the majority of scores are on the upper end of the distribution.
Slope Bias
A term referring to a situation in which a test yields significantly different validity coefficients for different groups, resulting in different regression lines.
Spearman-Brown Formula
A formula for correcting a split-half reliability coefficient that estimates what the coefficient would be if the original number of items were used
Spearman’s Model
A two-factor theory of intelligence that postulates everyone has a general ability factor influencing their performance on intellectual tasks, and also specific factors correlated to g that influence performance in specific areas.
Split-half Reliability
One of the internal consistency measures of reliability in which the instrument is administered once and then split into two halves. The scores on the two halves are then correlated to provide an estimate of reliability. Often the split-half reliability coefficients are corrected using the Spearman-Brown formula. This formula adjusts the coefficient for using only half of the total number of items to provide an estimate of what the correlation coefficient would be if the original number of items was used.
Standard Deviation
The most common statistic used to describe the variability of a set of measurements. It is the square root of the variance.
Standard Error of Difference
A measure used by a counselor to examine the difference between two scores and determine if there is a significant difference.
Standard Error of Estimate
A numerical result that indicates the margin of expected error in the individual’s predicted criterion score as a result of imperfect validity.
Standard Error of Measurement
This deviation provides an indication of what an individual’s true score would be if he or she took the instrument repeated times. Counselors can use standard error of measurement to determine the range of scores 68%, 95%, or 99.5% of the time.
Stratified Sample
A type of sample in which individuals are selected for the norming group based on certain demographic chracteristics.
Structured Interview
An interview that is conducted using a predetermined set of questions that is asked in the same manner and sequence for every client.
Structured Personality Instruments
Formalized assessments in which clients respond to a fixed set of questions or items
Summative Evaluation
A cumulative evaluation of services that are typically completed at the endpoint of the service. These types of evaluation are designed to provide an overall indication of the effectiveness of the services.
Target Behavior
The behavior that is being assessed in behavior assessment.
Test
An individual instrument in which the focus is on evalutation
Testing
A process of giving clients tests and/or instruments.
Test-Retest Reliability
One in which the reliability coefficient is obtained by correlating a group’s performance on the first administration of an instrument with the same group’s performance on the second ad- ministration of that same instrument
Test Sophistication
A term applied to an individual’s level of knowledge in test-taking skills. It is not related to knowledge of the content but rather to the format of the tests and the skills required for maneuvering through that format.
Universal Design
An approach to instrument design with the goal of maximizing accessibility for all intended examinees.
Unstructured Interview
An interview in which the clinician gears the questions toward each individual client and there is no established set of questions.
Validity Coefficient
The correlation between scores on an instrument and the criterion measure.
Validity Generalization
Term applied to findings in- dicating that the validity of cognitive ability tests can be generalized and that cognitive ability is highly related to job performance
Variance
The average of the squared deviation from the mean. It is a measure of variability and its square root is the standard deviation of the set of measurements.
Z Scores
A standard score that always has a mean of 0 and a standard deviation of 1.