PSYC 549 - Psychometrics Flashcards
Achievement Test
A test that is designed to measure an individual’s level of knowledge in a particular area; generally used in schools and educational settings. Unlike an aptitude test, which measures a person’s ability to learn something, an achievement test focuses specifically on how much a person knows about a specific topic. It measures an individual’s previous learning. Example: The comps exam is an achievement test, as it is designed to measure how thoroughly clinical counseling students have learned the information in the ten core classes of the program.
Aptitude test
Any assessment instrument designed to measure potential for acquiring knowledge or skill. Aptitude tests are thought of as providing a basis for making predictions for an individual’s future success, particularly in an educational or occupational situation. In contrast, achievement tests are considered to reflect the amount of learning already obtained.
Often used for measuring high school students’ potential for college. Aptitude tests are prone to bias. Example: The SAT is an aptitude test designed to predict a student’s potential success in college. There is reason to doubt the predictive validity of the SAT.
Construct
Part of: psychometrics and psychological testing. What: Constructs are developed to measure complex, abstract concepts that are indirectly observed. They are based on a characteristic which is not directly observable, and is an internal event or process that must be inferred from external behavior. Constructs may be derived from theory, research, or observation. Example: The counselor administered a paper and pencil assessment measure that solicited responses related to fidgeting, excessive worrying, difficulty concentrating - all representing the construct of anxiety. Anxiety can be measured indirectly by assessing the prevalence of these bxs.
Correlation v. causation
Correlation refers to a relationship between variables, but does not imply causation (i.e. that one variable causes change in the other). Mediating variables may explain the relationship, or the relationship can be bidirectional (thus both would be causal). Causality can only be determined under experimental conditions. An experiment requires random assignment of participants and manipulation of at least one independent variable. Example: An observational study of third graders found a positive correlation between students who ate breakfast and test scores. The researchers cannot conclude whether eating breakfast causes students to test better, whether students with higher test scores are more likely to eat breakfast, or whether there is some other variable contributing to the relationship.
Criterion-referenced scoring/tests
Part of: psychometrics and testing What: Criterion-referenced tests evaluate a specific skill/ability/task that the test taker must demonstrate. Scores are compared to a pre-set criterion score (an benchmark score), not compared to a norm or other individual’s scores. Doesn’t take into account group differences or culture biases. Example: The No Child Left Behind Act required state testing of students to use a criterion-referenced scoring model.
Tom is bringing his son into therapy because he isn’t doing well in school but has tested and has a very high IQ. The therapist explains that most tests in school are criterion-referenced, where the student must know a certain amount of information based on a benchmark, where the IQ is norm-referenced. They develop strategies for his son to succeed in learning the informtion in the classroom.
Criterion-related validity
Part of: psychometrics What: Extent to which a test corresponds with a particular criterion (standard against which test is compared). Typically used when objective is to predict future performance on an unknown criterion. Predictive criterion validity refers to a test or measure that predicts future performance/success in relation to a particular criterion (SAT -> success in college). Concurrent criterion validity refers to a concurrent measure taken at same time as test (driver’s test/written test) Example: Sally is creating an assessment for depression and wants to makes sure that her test has concurrent criterion validity. She administers the assessment to a group of individuals while at the same time administering the BDI. If her test scores are similar to those on the BDI, her test has concurrent validity.
Validity (types of)
What: A psychometric property. Validity is the extent to which an assessment or study measures the construct intended to measure; a validity coefficient of 0.3-0.4 is considered adequate. Types of validity: Face validity: based on logical analysis rather than statistical analysis, face validity is the appearance that a test measure what it purports to at a surface level. Content validity: evidence that the content of a test adequately represents the conceptual domain it is designed to cover; test items are a fair sample of the total potential content and relevant to construct being tested. Based on logical analysis v. statistical analysis. Criterion validity: extent to which a test corresponds with a particular criterion against which it is compared; indicated by high correlations between a test and a well-defined measure. Concurrent and Predictive. For example, you might examine a suicide risk scale and suicide rates/attempts or driver skill test and number of infractions on the road. Construct validity : the degree to which the test measures the construct or trait it intends to measure. This may be demonstrated with convergent evidence (when there is a high correlation between two or more tests that purport to assess the same criterion) or discriminant evidence (when two tests of unrelated constructs have low correlations; that is, they discriminate between two qualities that are not related to each other).
Incremental Validity - subjective
* Measure of unique information gained through using a test
* How much does information from test add to what is already known?
* How well does it improve the accuracy of decisions?
* Based on logical analysis vs. statistical analysis
Personality tests are poor predictors of job performance
Do they add anything “above and beyond” other measures”
Lack incremental validity if it measures something we don’t need - more info isn’t always better
Example: See essay
Standard deviation
Part of: statistics and measurement techniques What: the average amount that scores differ from the mean score of a distribution. Found by taking the square root of the variance, which is the average squared deviation around the mean. Gives an approximation of how much a typical score is above or below the average score.Always a positive number > 0 ; 0 only occurs theoretically; there’s always variabilityIn general, smaller SD = scores closer to mean; larger SD = larger distribution/spread. Example: When my daughter was being evaluated for a developmental delay, she was assessed using a developmental inventory. In order to qualify for early intervention services, she would need to score 2 standard deviations below the mean, indicating a 40% developmental delay.
Norm referenced scoring/tests
Part of: psychometrics and assessmentWhat: A norm referenced test evaluates a test taker’s performance against a standardized sample; typically used for the purpose of making comparisons with a larger group. Norms should be current, relevant, and representative of the group to which the individual is being compared. Can be problematic when tests are not normed with a culturally diverse population. Example: IQ testing is an example of norm referenced scoring/testing because an individual’s score is always interpreted in terms of typical performance/results compared to other people in their population.
Assessment interview (psychometrics)
Part of: psychometrics and clinical practiceWhat: An initial interview conducted for the purpose of gathering information to answer a referral question such as, “Does this child have Autism Spectrum Disorder”? An assessment interview differs from a clinical interview in that once the referral question is answered, the therapist will likely refer the client elsewhere. In a clinical interview, the counselor would gather information about the patient and begin to form a conceptualization of their case and presenting problems to establish the therapeutic relationship and treatment. An assessment interview is more likely to include standardized tests (intelligence, personality, paper-pencil) and talk to multiple sources.- May be structured, in which case a certain order of questions and format is strictly adhered to, or may be unstructured, in which the interviewer is free to follow their own course of questioning.- Structured interviews are generally more reliable and valid, but lack the freedom of unstructured interviews to pursue a topic of interest or follow an instinct EXAMPLE: Lillian’s classroom behavior leads her teacher and school counselor to believe she may have ADHD. The counselor refers Lillian to a child psychologist who specializes in ADHD. The psychologist conducts and assessment interview including a variety of tests and interviews with the teacher, parents, and school counselor to determine if an ADHD diagnosis is appropriate.
Normal curve
Part of: statistics and researchWhat: A normal curve is the bell-shaped curve created by a normal distribution of a population with symmetry around central tendencies. Random sampling tends to produce a normal curve. Most statistical procedures in psychology assume normally distributed scores. Parametric stats are based on normal distributions - stronger statistical power. EXAMPLE: The researcher was at IQ level to see if it correlated with depression and was able to get a large sample size that was random. Most scores fell in the middle of the bell shaped curve for IQ level, showing a normal curve and allowing her to continue with parametric statistics to compare it with IQ level.
Variance
Part of: statistics and data analysis What: variance is a measure of the variation of or differences among subjects in a distribution across the measure X. Variance arises from natural, random differences among subjects or from environmental variations, measurement error, or researcher error. To calculate variance, square the deviation around the mean. Smaller variance means less spread of scores and larger variances means more spread out scores. EXAMPLE: A random sample study was conducted on a group of individuals to test for anxiety. The spread of scores or variance was large, meaning that they were more spread out with high and low scores being further apart.
Standard scores
Part of: statistics and assessment What: Standard scores are raw scores that are converted to z-scores that have a fixed mean and SD. Raw scores are converted into standard scores to make objective comparisons about the data; the mean z-score is always 0 and the SD is always 1. +1 z-score is same as +1 SD. +Score if someone scores higher than the mean, - if they score lower. EXAMPLE: Two clients are being treated with CBT for depression. The counselor wants to compare the baseline “severity” of depression. The clients took different measures of depression, the BDI and the PHQ. The counselor converts their scores to standard scores, or z-scores, to compare the two.
Standard error of measurement
Part of: statistical analysis What: A common tool in research and standardized testing which provides an estimate of how much an individual’s score would be expected to change on re-testing with same/equivalent form of test. How much the test scores are spread around the true score. Avg the scores over the infinite number of tests, the average of scores is considered an estimate of the true ability/knowledge (T true score). The standard deviation of all those scores= SEM (The smaller the SEM the more precise the measurement capacity of the instrument) creates a confidence band within which a person’s true score would be expected to fall. Has an inverse relationship with the reliability coefficient (High SEM = Low Reliability). Common tool in psychological research and standardized testing EXAMPLE: A researcher develops a test to measure depression, then administers it to a sample. They want to analyze the data that they gathered using statistics. They calculate the SEM and it turns out to be low which indicates that the measurement is fairly precise. They then decide to carry out further statistical analysis.
Reliability (types of)
Part of: psychometrics and research design What: Reliability refers to the consistency, or repeatability of test results. Reliability is a foundational characteristic of “psychometric soundness” Types of reliability: Inter-Rater Reliability examines the degree of consistency between different raters’ scorings, particularly in behavioral observation studies. An ethogram helps researchers create operational definitions to increase inter rater reliability. -Correlation between those scores (Kappa statistic)
Test-Retest Reliability refers to the consistency of a measure when the same test is administered to the same person at different points in time. - Only works for stable traits - The interval between measurements must be considered: shorter intervals -> higher carryover - Be careful of developmental milestones. Not often used.
Parallel Forms Reliability compares scores on two different measures of the same quality. Convert to standard scores. Rigorous - more testing but no carryover effects.
Internal Consistency Reliability examines the consistency of items within a single test (subcomponents instead of different tests) - Done via split-half, KR20, and Cronbach’s Alpha - Split-half is when test is split in half and the correlation between the two halves is examine. Used most often.
EXAMPLE: While developing a new version of an IQ test, researchers gave the test to the same group of subjects at several different times to evaluate the instrument’s test-retest reliability.