PSYC 549 - Psychometrics Flashcards

1
Q

Achievement Test

A

A test that is designed to measure an individual’s level of knowledge in a particular area; generally used in schools and educational settings. Unlike an aptitude test, which measures a person’s ability to learn something, an achievement test focuses specifically on how much a person knows about a specific topic. It measures an individual’s previous learning. Example: The comps exam is an achievement test, as it is designed to measure how thoroughly clinical counseling students have learned the information in the ten core classes of the program.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Aptitude test

A

Any assessment instrument designed to measure potential for acquiring knowledge or skill. Aptitude tests are thought of as providing a basis for making predictions for an individual’s future success, particularly in an educational or occupational situation. In contrast, achievement tests are considered to reflect the amount of learning already obtained.
Often used for measuring high school students’ potential for college. Aptitude tests are prone to bias. Example: The SAT is an aptitude test designed to predict a student’s potential success in college. There is reason to doubt the predictive validity of the SAT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Construct

A

Part of: psychometrics and psychological testing. What: Constructs are developed to measure complex, abstract concepts that are indirectly observed. They are based on a characteristic which is not directly observable, and is an internal event or process that must be inferred from external behavior. Constructs may be derived from theory, research, or observation. Example: The counselor administered a paper and pencil assessment measure that solicited responses related to fidgeting, excessive worrying, difficulty concentrating - all representing the construct of anxiety. Anxiety can be measured indirectly by assessing the prevalence of these bxs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Correlation v. causation

A

Correlation refers to a relationship between variables, but does not imply causation (i.e. that one variable causes change in the other). Mediating variables may explain the relationship, or the relationship can be bidirectional (thus both would be causal). Causality can only be determined under experimental conditions. An experiment requires random assignment of participants and manipulation of at least one independent variable. Example: An observational study of third graders found a positive correlation between students who ate breakfast and test scores. The researchers cannot conclude whether eating breakfast causes students to test better, whether students with higher test scores are more likely to eat breakfast, or whether there is some other variable contributing to the relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Criterion-referenced scoring/tests

A

Part of: psychometrics and testing What: Criterion-referenced tests evaluate a specific skill/ability/task that the test taker must demonstrate. Scores are compared to a pre-set criterion score (an benchmark score), not compared to a norm or other individual’s scores. Doesn’t take into account group differences or culture biases. Example: The No Child Left Behind Act required state testing of students to use a criterion-referenced scoring model.
Tom is bringing his son into therapy because he isn’t doing well in school but has tested and has a very high IQ. The therapist explains that most tests in school are criterion-referenced, where the student must know a certain amount of information based on a benchmark, where the IQ is norm-referenced. They develop strategies for his son to succeed in learning the informtion in the classroom.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Criterion-related validity

A

Part of: psychometrics What: Extent to which a test corresponds with a particular criterion (standard against which test is compared). Typically used when objective is to predict future performance on an unknown criterion. Predictive criterion validity refers to a test or measure that predicts future performance/success in relation to a particular criterion (SAT -> success in college). Concurrent criterion validity refers to a concurrent measure taken at same time as test (driver’s test/written test) Example: Sally is creating an assessment for depression and wants to makes sure that her test has concurrent criterion validity. She administers the assessment to a group of individuals while at the same time administering the BDI. If her test scores are similar to those on the BDI, her test has concurrent validity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Validity (types of)

A

What: A psychometric property. Validity is the extent to which an assessment or study measures the construct intended to measure; a validity coefficient of 0.3-0.4 is considered adequate. Types of validity: Face validity: based on logical analysis rather than statistical analysis, face validity is the appearance that a test measure what it purports to at a surface level. Content validity: evidence that the content of a test adequately represents the conceptual domain it is designed to cover; test items are a fair sample of the total potential content and relevant to construct being tested. Based on logical analysis v. statistical analysis. Criterion validity: extent to which a test corresponds with a particular criterion against which it is compared; indicated by high correlations between a test and a well-defined measure. Concurrent and Predictive. For example, you might examine a suicide risk scale and suicide rates/attempts or driver skill test and number of infractions on the road. Construct validity : the degree to which the test measures the construct or trait it intends to measure. This may be demonstrated with convergent evidence (when there is a high correlation between two or more tests that purport to assess the same criterion) or discriminant evidence (when two tests of unrelated constructs have low correlations; that is, they discriminate between two qualities that are not related to each other).
Incremental Validity - subjective
* Measure of unique information gained through using a test
* How much does information from test add to what is already known?
* How well does it improve the accuracy of decisions?
* Based on logical analysis vs. statistical analysis
Personality tests are poor predictors of job performance
Do they add anything “above and beyond” other measures”
Lack incremental validity if it measures something we don’t need - more info isn’t always better

Example: See essay

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Standard deviation

A

Part of: statistics and measurement techniques What: the average amount that scores differ from the mean score of a distribution. Found by taking the square root of the variance, which is the average squared deviation around the mean. Gives an approximation of how much a typical score is above or below the average score.Always a positive number > 0 ; 0 only occurs theoretically; there’s always variabilityIn general, smaller SD = scores closer to mean; larger SD = larger distribution/spread. Example: When my daughter was being evaluated for a developmental delay, she was assessed using a developmental inventory. In order to qualify for early intervention services, she would need to score 2 standard deviations below the mean, indicating a 40% developmental delay.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Norm referenced scoring/tests

A

Part of: psychometrics and assessmentWhat: A norm referenced test evaluates a test taker’s performance against a standardized sample; typically used for the purpose of making comparisons with a larger group. Norms should be current, relevant, and representative of the group to which the individual is being compared. Can be problematic when tests are not normed with a culturally diverse population. Example: IQ testing is an example of norm referenced scoring/testing because an individual’s score is always interpreted in terms of typical performance/results compared to other people in their population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Assessment interview (psychometrics)

A

Part of: psychometrics and clinical practiceWhat: An initial interview conducted for the purpose of gathering information to answer a referral question such as, “Does this child have Autism Spectrum Disorder”? An assessment interview differs from a clinical interview in that once the referral question is answered, the therapist will likely refer the client elsewhere. In a clinical interview, the counselor would gather information about the patient and begin to form a conceptualization of their case and presenting problems to establish the therapeutic relationship and treatment. An assessment interview is more likely to include standardized tests (intelligence, personality, paper-pencil) and talk to multiple sources.- May be structured, in which case a certain order of questions and format is strictly adhered to, or may be unstructured, in which the interviewer is free to follow their own course of questioning.- Structured interviews are generally more reliable and valid, but lack the freedom of unstructured interviews to pursue a topic of interest or follow an instinct EXAMPLE: Lillian’s classroom behavior leads her teacher and school counselor to believe she may have ADHD. The counselor refers Lillian to a child psychologist who specializes in ADHD. The psychologist conducts and assessment interview including a variety of tests and interviews with the teacher, parents, and school counselor to determine if an ADHD diagnosis is appropriate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Normal curve

A

Part of: statistics and researchWhat: A normal curve is the bell-shaped curve created by a normal distribution of a population with symmetry around central tendencies. Random sampling tends to produce a normal curve. Most statistical procedures in psychology assume normally distributed scores. Parametric stats are based on normal distributions - stronger statistical power. EXAMPLE: The researcher was at IQ level to see if it correlated with depression and was able to get a large sample size that was random. Most scores fell in the middle of the bell shaped curve for IQ level, showing a normal curve and allowing her to continue with parametric statistics to compare it with IQ level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Variance

A

Part of: statistics and data analysis What: variance is a measure of the variation of or differences among subjects in a distribution across the measure X. Variance arises from natural, random differences among subjects or from environmental variations, measurement error, or researcher error. To calculate variance, square the deviation around the mean. Smaller variance means less spread of scores and larger variances means more spread out scores. EXAMPLE: A random sample study was conducted on a group of individuals to test for anxiety. The spread of scores or variance was large, meaning that they were more spread out with high and low scores being further apart.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Standard scores

A

Part of: statistics and assessment What: Standard scores are raw scores that are converted to z-scores that have a fixed mean and SD. Raw scores are converted into standard scores to make objective comparisons about the data; the mean z-score is always 0 and the SD is always 1. +1 z-score is same as +1 SD. +Score if someone scores higher than the mean, - if they score lower. EXAMPLE: Two clients are being treated with CBT for depression. The counselor wants to compare the baseline “severity” of depression. The clients took different measures of depression, the BDI and the PHQ. The counselor converts their scores to standard scores, or z-scores, to compare the two.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Standard error of measurement

A

Part of: statistical analysis What: A common tool in research and standardized testing which provides an estimate of how much an individual’s score would be expected to change on re-testing with same/equivalent form of test. How much the test scores are spread around the true score. Avg the scores over the infinite number of tests, the average of scores is considered an estimate of the true ability/knowledge (T true score). The standard deviation of all those scores= SEM (The smaller the SEM the more precise the measurement capacity of the instrument) creates a confidence band within which a person’s true score would be expected to fall. Has an inverse relationship with the reliability coefficient (High SEM = Low Reliability). Common tool in psychological research and standardized testing EXAMPLE: A researcher develops a test to measure depression, then administers it to a sample. They want to analyze the data that they gathered using statistics. They calculate the SEM and it turns out to be low which indicates that the measurement is fairly precise. They then decide to carry out further statistical analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Reliability (types of)

A

Part of: psychometrics and research design What: Reliability refers to the consistency, or repeatability of test results. Reliability is a foundational characteristic of “psychometric soundness” Types of reliability: Inter-Rater Reliability examines the degree of consistency between different raters’ scorings, particularly in behavioral observation studies. An ethogram helps researchers create operational definitions to increase inter rater reliability. -Correlation between those scores (Kappa statistic)
Test-Retest Reliability refers to the consistency of a measure when the same test is administered to the same person at different points in time. - Only works for stable traits - The interval between measurements must be considered: shorter intervals -> higher carryover - Be careful of developmental milestones. Not often used.
Parallel Forms Reliability compares scores on two different measures of the same quality. Convert to standard scores. Rigorous - more testing but no carryover effects.
Internal Consistency Reliability examines the consistency of items within a single test (subcomponents instead of different tests) - Done via split-half, KR20, and Cronbach’s Alpha - Split-half is when test is split in half and the correlation between the two halves is examine. Used most often.
EXAMPLE: While developing a new version of an IQ test, researchers gave the test to the same group of subjects at several different times to evaluate the instrument’s test-retest reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Projective test

A

Part of: psychometrics, psychological testingWhat: Test in which the stimulus, the required response, or both are ambiguous. The general idea behind projective tests is that a person’s interpretation of an ambiguous stimulus reflects his unique characteristics.- Most often personality tests- Have fallen out of favor in recent years. - Tests include the Rorschach inkblot test and the Thematic Apperception Test among others. - Usually these types of tests require extensive training and not a lot of evaluator agreement- Most fall flat when psychometric properties are examined i.e. low reliability low validity
EXAMPLE: You are seeing a client and you ask them to interpret a black ‘blob’ while using the Rorschach inkblot test. This is a projective test that suggests the client saying that she sees a crab in the image might be indicative of her mood at the time of testing.

17
Q

Objective test

A

Part of: psychometrics, psychological testing/assessmentWhat: Objective tests are more structured than projective tests. They use multiple choice, true/false, or Likert scale format, and are usually self-report. Answers are scored quantitatively, clearly stated questions and answers. No subjective element, therefore not influenced by rater variables. EXAMPLE: The psychologist found that when assessing clients with Borderline Personality Disorder, objective tests of personality–such as the MMPI– were more valid in providing personality information than projective tests–such as the Rorschach Test– in which his own personal bias or judgment could hinder the test results thus affecting the reliability and validity of the measure. The objective tests were also easier to score.

18
Q

Standardization sample

A

Part of: psychometrics and psychological testing/assessmentWhat: When a test is constructed, a standardization sample is a group who has taken the test under standard conditions so that future test takers’ scores can be compared to the scores of the standardization sample. It is crucial that the standardization sample be representative of the population for which the test will be used. Example: Karen is worried about her 8 year old receiving an IQ test becuase she’s heard that they are biased and not fair for black students. You reassure her that the IQ tests that are given now are standardized to meet cultural differences of the students.

19
Q

Cross validation

A

A procedure used to assess the utility or stability of a statistical model. A data set (all given at the same time) is randomly divided into two subsets, the first of which (the derivation sample) is used to develop the model (internal consistency, factor analysis, convergent) and the second of which (the cross-validation sample) is used to test it and norm it.
one way of telling if your sample is a valid representation of the population (large sample size is better cross validation)
EXAMPLE:

20
Q

Clinical v. statistical significance

A

Clinical significance refers to the meaningfulness of change in a client’s life. Statistical significance refers to the reliability of an outcome and is calculated mathematically. Generally in psychology, a result is statistically significant if the p-value is < .05, meaning that there is a less than 5% chance that the result is due to chance. Example: In a randomized trial for a new drug to treat depression, results were not statistically significant. However, a small percentage of the participants found that they felt better than they had in months and were able to resume some everyday activities while taking the drug. Their results are clinically significant, since the difference in their lives was noticeable, despite the drug not meeting the bar for statistical significance.

21
Q

Test bias

A

A difference in test scores that can be attributed to demographic variables such as age, sex, and race. Tests are considered biased if a test design systematically disadvantages certain groups of people over others. the tendency of scores on a test to systematically over- or underestimate the true performance of individuals to whom that test is administered, particularly because they are members of specific groups (e.g., ethnic minorities, one or the other gender).
Example: A therapist develops a test for measuring disordered eating, however it is found to be biased as it assesses societal influences according to norms set by cisgender females.