PSYC 549- Psychometrics Flashcards
Achievement Test
A test that is designed to measure an individual’s level of knowledge in a particular area; generally used in schools and educational settings. Unlike an aptitude test, which measures a person’s ability to learn something, an achievement test focuses specifically on how much a person knows about a specific topic. It measures an individual’s previous learning.
Example: The comps exam is an achievement test, as it is designed to measure how thoroughly clinical counseling students have learned the information in the ten core classes of the program.
Aptitude test
Measures a person’s potential to learn or acquire specific skills. Often used for measuring high school students’ potential for college. Aptitude tests are prone to bias.
Example: The SAT is an aptitude test designed to predict a student’s potential success in college. There is reason to doubt the predictive validity of the SAT.
Construct
Part of: psychometrics and psychological testing.
What: Constructs are developed to measure complex, abstract concepts that are indirectly observed. They are based on a characteristic which is not directly observable, and is an internal event or process that must be inferred from external behavior. Constructs may be derived from theory, research, or observation.
Example: The counselor administered a paper and pencil assessment measure that solicited responses related to fidgeting, excessive worrying, difficulty concentrating - all representing the construct of anxiety. Anxiety can be measured indirectly by assessing the prevalence of these bxs.
Correlation v. causation
Correlation refers to a relationship between variables, but does not imply causation (i.e. that one variable causes change in the other). Mediating variables may explain the relationship, or the relationship can be bidirectional (thus both would be causal). Causality can only be determined under experimental conditions. An experiment requires random assignment of participants and manipulation of at least one independent variable.
Example: An observational study of third graders found a positive correlation between students who ate breakfast and test scores. The researchers cannot conclude whether eating breakfast causes students to test better, whether students with higher test scores are more likely to eat breakfast, or whether there is some other variable contributing to the relationship.
Criterion-referenced scoring/tests
Part of: psychometrics and testing
What: Criterion-referenced tests evaluate a specific skill/ability/task that the test taker must demonstrate. Scores are compared to a pre-set criterion score, not compared to a norm or other scores.
Example: The No Child Left Behind Act required state testing of students to use a criterion-referenced scoring model. Students were expected to meet certain benchmark scores rather than evaluate their performance against their peers.
Criterion-related validity
Part of: psychometrics
What: Extent to which a test corresponds with a particular criterion (standard against which test is compared). Typically used when objective is to predict future performance on an unknown criterion.
Predictive criterion validity refers to a test or measure that predicts future performance/success in relation to a particular criterion (SAT -> success in college).
Concurrent criterion validity refers to a concurrent measure taken at same time as test (driver’s test/written test)
Example: Applicants to a software company are given a test of job aptitude. After six months working at the company, job performance is evaluated. The two scores are compared to asses the criterion validity of the job aptitude test.
Validity (types of)
What: A psychometric property. Validity is the extent to which an assessment or study measures the construct intended to measure; a validity coefficient of 0.3-0.4 is considered adequate.
Types of validity:
Face validity: based on logical analysis rather than statistical analysis, face validity is the appearance that a test measure what it purports to at a surface level.
Content validity: evidence that the content of a test adequately represents the conceptual domain it is designed to cover; test items are a fair sample of the total potential content and relevant to construct being tested. Based on logical analysis v. statistical analysis.
Criterion validity: extent to which a test corresponds with a particular criterion against which it is compared; indicated by high correlations between a test and a well-defined measure. For example, you might examine a suicide risk scale and suicide rates/attempts or driver skill test and number of infractions on the road.
Construct validity: the degree to which the test measures the construct or trait it intends to measure. This may be demonstrated with convergent evidence (when there is a high correlation between two or more tests that purport to assess the same criterion) or discriminant evidence (when two tests of unrelated constructs have low correlations; that is, they discriminate between two qualities that are not related to each other).
Example: A business psychologist has developed a new IQ-test that requires only 5 minutes per subject, as compared to 90 min for the test acknowledged as the gold standard. He administers both tests to a sample population and then compared scores. The correlation between the two sets of scores was high indicating that the new test has that the new test has construct validity.
Standard deviation
Part of: statistics and measurement techniques
What: the average amount that scores differ from the mean score of a distribution. Found by taking the square root of the variance, which is the average squared deviation around the mean. Gives an approximation of how much a typical score is above or below the average score.
Always a positive number > 0 ; 0 only occurs theoretically; there’s always variability
In general, smaller SD = scores closer to mean; larger SD = larger distribution/spread
Example: When my daughter was being evaluated for a developmental delay, she was assessed using the Battelle Developmental Inventory-2 (BDI-2). In order to qualify for early intervention services, she would need to score 2 standard deviations below the mean, indicating a 40% developmental delay.
Norm referenced scoring/tests
Part of: psychometrics and assessment
What: A norm referenced test evaluates a test taker’s performance against a standardized sample; typically used for the purpose of making comparisons with a larger group.
Norms should be current, relevant, and representative of the group to which the individual is being compared. Can be problematic when tests are not normed with a culturally diverse population.
Example: The child psychologist tested the adolescent’s IQ and discovered that the child’s IQ was 165, placing him in the 99th percentile, more than 3 SDs above the mean on the normal curve because IQ is normally distributed. IQ testing is an example of norm referenced scoring/testing because an individual’s score is always interpreted in terms of typical performance/results.
Assessment interview
Part of: psychometrics and clinical practice
What: An initial interview conducted for the purpose of gathering information to answer a referral question such as, “Does this child have Autism Spectrum Disorder”?
An assessment interview differs from a clinical interview in that once the referral question is answered, the therapist will likely refer the client elsewhere. In a clinical interview, the counselor would gather information about the patient and begin to form a conceptualization of their case and presenting problems to establish the therapeutic relationship and treatment. An assessment interview is more likely to include standardized tests (intelligence, personality, paper-pencil) and talk to multiple sources.
- May be structured, in which case a certain order of questions and format is strictly adhered to, or may be unstructured, in which the interviewer is free to follow their own course of questioning.
- Structured interviews are generally more reliable and valid, but lack the freedom of unstructured interviews to pursue a topic of interest or follow an instinct
EXAMPLE: Lillian’s classroom behavior leads her teacher and school counselor to believe she may have ADHD. The counselor refers Lillian to a child psychologist who specializes in ADHD. The psychologist conducts and assessment interview including a variety of tests and interviews with the teacher, parents, and school counselor to determine if an ADHD diagnosis is appropriate.
Normal curve
Part of: statistics and research
What: A normal curve is the bell-shaped curve created by a normal distribution of a population with symmetry around central tendencies. Random sampling tends to produce a normal curve. Most statistical procedures in psychology assume normally distributed scores. Parametric stats are based on normal distributions.
EXAMPLE: The child psychologist tested the adolescent’s IQ and discovered that the child’s IQ was 165, placing him in the 99th percentile, more than 3 standard deviations above the mean on the normal curve because IQ is normally distributed.
Variance
Part of: statistics and data analysis
What: variance is a measure of the variation of or differences among subjects in a distribution across the measure X. Variance arises from natural, random differences among subjects or from environmental variations, measurement error, or researcher error.
To calculate variance, square the deviation around the mean. It must be squared because sum of deviations around mean would always equal zero.
EXAMPLE: A clinical psychologist is doing research on a new tx for substance use disorders. She conducts an experiment in which she compares the tx group to a group that received the gold standard tx and to a control group. At first glance it looks like the level of symptomatic reduction is the same in the new tx and gold standard tx groups, but upon further inspection the psychologist notes that the new tx group has a large amount of variance. That is, some people saw significant sx reduction and others saw very minimal change. She needs to investigate this further. What is it that makes the new tx beneficial for some?
Standard scores
Part of: statistics and assessment
What: Standard scores are raw scores that are converted to z-scores that have a fixed mean and SD. Raw scores are converted into standard scores to make objective comparisons about the data; the mean z-score is always 0 and the SD is always 1.
EXAMPLE: Two clients are being treated with CBT for depression. The counselor wants to compare the baseline “severity” of depression. The clients took different measures of depression, the BDI and the QIDS (Quick Inventory of Depressive Symptomatology). The counselor converts their scores to standard scores, or z-scores, to compare the two.
Standard error of measurement
Part of: psychometrics and statistics
What: An index of the amount of error in a test or measure. The standard error of measurement is a standard deviation of a set of observations for the same test. It is essentially an estimate of how much an individual’s score would be expected to change on re-testing with same/equivalent form of test. A person’s “true score” is expected to fall somewhere within the confidence band created by the standard error of measurement.
Example: An IQ test has a mean of 100 for a particular sample, and a standard deviation of 14. A school counselor wants to know the range of the true score for someone whose score was 106. Calculating the standard error of measurement will provide the range that student’s “true score” is expected to fall within.
Reliability (types of)
Part of: psychometrics and research design
What: Reliability refers to the accuracy, dependability, consistency, or repeatability of test results. Reliability is a foundational characteristic of “psychometric soundness”
Types of reliability:
Inter-Rater Reliability examines the degree of consistency between different raters’ scorings, particularly in behavioral observation studies. An ethogram helps researchers create operational definitions to increase inter rater reliability.
-Correlation between those scores (Kappa statistic)
Test-Retest Reliability refers to the consistency of a measure when the same test is administered to the same person at different points in time.
- Only works for stable traits
- The interval between measurements must be considered: shorter intervals -> higher carryover
- Be careful of developmental milestones
Parallel Forms Reliability compares scores on two different measures of the same quality
Internal Consistency Reliability examines the consistency of items within a test
- Done via split-half, KR20, and Cronbach’s Alpha
- Split-half is when test is split in half and the correlation between the two halves is examine
EXAMPLE: While developing a new version of an IQ test, researchers gave the test to the same group of subjects at several different times to evaluate the instrument’s test-retest reliability.