PSYC 549- Psychometrics Flashcards

Question 1

Q

Achievement Test

Answer

A

A test that is designed to measure an individual’s level of knowledge in a particular area; generally used in schools and educational settings. Unlike an aptitude test, which measures a person’s ability to learn something, an achievement test focuses specifically on how much a person knows about a specific topic. It measures an individual’s previous learning.

Example: The comps exam is an achievement test, as it is designed to measure how thoroughly clinical counseling students have learned the information in the ten core classes of the program.

Question 2

Q

Aptitude test

Answer

A

Measures a person’s potential to learn or acquire specific skills. Often used for measuring high school students’ potential for college. Aptitude tests are prone to bias.

Example: The SAT is an aptitude test designed to predict a student’s potential success in college. There is reason to doubt the predictive validity of the SAT.

Question 3

Q

Construct

Answer

A

Part of: psychometrics and psychological testing.

What: Constructs are developed to measure complex, abstract concepts that are indirectly observed. They are based on a characteristic which is not directly observable, and is an internal event or process that must be inferred from external behavior. Constructs may be derived from theory, research, or observation.

Example: The counselor administered a paper and pencil assessment measure that solicited responses related to fidgeting, excessive worrying, difficulty concentrating - all representing the construct of anxiety. Anxiety can be measured indirectly by assessing the prevalence of these bxs.

Question 4

Q

Correlation v. causation

Answer

A

Correlation refers to a relationship between variables, but does not imply causation (i.e. that one variable causes change in the other). Mediating variables may explain the relationship, or the relationship can be bidirectional (thus both would be causal). Causality can only be determined under experimental conditions. An experiment requires random assignment of participants and manipulation of at least one independent variable.

Example: An observational study of third graders found a positive correlation between students who ate breakfast and test scores. The researchers cannot conclude whether eating breakfast causes students to test better, whether students with higher test scores are more likely to eat breakfast, or whether there is some other variable contributing to the relationship.

Question 5

Q

Criterion-referenced scoring/tests

Answer

A

Part of: psychometrics and testing

What: Criterion-referenced tests evaluate a specific skill/ability/task that the test taker must demonstrate. Scores are compared to a pre-set criterion score, not compared to a norm or other scores.

Example: The No Child Left Behind Act required state testing of students to use a criterion-referenced scoring model. Students were expected to meet certain benchmark scores rather than evaluate their performance against their peers.

Question 6

Q

Criterion-related validity

Answer

A

Part of: psychometrics

What: Extent to which a test corresponds with a particular criterion (standard against which test is compared). Typically used when objective is to predict future performance on an unknown criterion.

Predictive criterion validity refers to a test or measure that predicts future performance/success in relation to a particular criterion (SAT -> success in college).

Concurrent criterion validity refers to a concurrent measure taken at same time as test (driver’s test/written test)

Example: Applicants to a software company are given a test of job aptitude. After six months working at the company, job performance is evaluated. The two scores are compared to asses the criterion validity of the job aptitude test.

Question 7

Q

Validity (types of)

Answer

A

What: A psychometric property. Validity is the extent to which an assessment or study measures the construct intended to measure; a validity coefficient of 0.3-0.4 is considered adequate.

Types of validity:

Face validity: based on logical analysis rather than statistical analysis, face validity is the appearance that a test measure what it purports to at a surface level.

Content validity: evidence that the content of a test adequately represents the conceptual domain it is designed to cover; test items are a fair sample of the total potential content and relevant to construct being tested. Based on logical analysis v. statistical analysis.

Criterion validity: extent to which a test corresponds with a particular criterion against which it is compared; indicated by high correlations between a test and a well-defined measure. For example, you might examine a suicide risk scale and suicide rates/attempts or driver skill test and number of infractions on the road.

Construct validity: the degree to which the test measures the construct or trait it intends to measure. This may be demonstrated with convergent evidence (when there is a high correlation between two or more tests that purport to assess the same criterion) or discriminant evidence (when two tests of unrelated constructs have low correlations; that is, they discriminate between two qualities that are not related to each other).

Example: A business psychologist has developed a new IQ-test that requires only 5 minutes per subject, as compared to 90 min for the test acknowledged as the gold standard. He administers both tests to a sample population and then compared scores. The correlation between the two sets of scores was high indicating that the new test has that the new test has construct validity.

Question 8

Q

Standard deviation

Answer

A

Part of: statistics and measurement techniques

What: the average amount that scores differ from the mean score of a distribution. Found by taking the square root of the variance, which is the average squared deviation around the mean. Gives an approximation of how much a typical score is above or below the average score.

Always a positive number > 0 ; 0 only occurs theoretically; there’s always variability

In general, smaller SD = scores closer to mean; larger SD = larger distribution/spread

Example: When my daughter was being evaluated for a developmental delay, she was assessed using the Battelle Developmental Inventory-2 (BDI-2). In order to qualify for early intervention services, she would need to score 2 standard deviations below the mean, indicating a 40% developmental delay.

Question 9

Q

Norm referenced scoring/tests

Answer

A

Part of: psychometrics and assessment

What: A norm referenced test evaluates a test taker’s performance against a standardized sample; typically used for the purpose of making comparisons with a larger group.

Norms should be current, relevant, and representative of the group to which the individual is being compared. Can be problematic when tests are not normed with a culturally diverse population.

Example: The child psychologist tested the adolescent’s IQ and discovered that the child’s IQ was 165, placing him in the 99th percentile, more than 3 SDs above the mean on the normal curve because IQ is normally distributed. IQ testing is an example of norm referenced scoring/testing because an individual’s score is always interpreted in terms of typical performance/results.

Question 10

Q

Assessment interview

Answer

A

Part of: psychometrics and clinical practice

What: An initial interview conducted for the purpose of gathering information to answer a referral question such as, “Does this child have Autism Spectrum Disorder”?

An assessment interview differs from a clinical interview in that once the referral question is answered, the therapist will likely refer the client elsewhere. In a clinical interview, the counselor would gather information about the patient and begin to form a conceptualization of their case and presenting problems to establish the therapeutic relationship and treatment. An assessment interview is more likely to include standardized tests (intelligence, personality, paper-pencil) and talk to multiple sources.

May be structured, in which case a certain order of questions and format is strictly adhered to, or may be unstructured, in which the interviewer is free to follow their own course of questioning.
Structured interviews are generally more reliable and valid, but lack the freedom of unstructured interviews to pursue a topic of interest or follow an instinct

EXAMPLE: Lillian’s classroom behavior leads her teacher and school counselor to believe she may have ADHD. The counselor refers Lillian to a child psychologist who specializes in ADHD. The psychologist conducts and assessment interview including a variety of tests and interviews with the teacher, parents, and school counselor to determine if an ADHD diagnosis is appropriate.

Question 11

Q

Normal curve

Answer

A

Part of: statistics and research

What: A normal curve is the bell-shaped curve created by a normal distribution of a population with symmetry around central tendencies. Random sampling tends to produce a normal curve. Most statistical procedures in psychology assume normally distributed scores. Parametric stats are based on normal distributions.

EXAMPLE: The child psychologist tested the adolescent’s IQ and discovered that the child’s IQ was 165, placing him in the 99th percentile, more than 3 standard deviations above the mean on the normal curve because IQ is normally distributed.

Question 12

Q

Variance

Answer

A

Part of: statistics and data analysis

What: variance is a measure of the variation of or differences among subjects in a distribution across the measure X. Variance arises from natural, random differences among subjects or from environmental variations, measurement error, or researcher error.

To calculate variance, square the deviation around the mean. It must be squared because sum of deviations around mean would always equal zero.

EXAMPLE: A clinical psychologist is doing research on a new tx for substance use disorders. She conducts an experiment in which she compares the tx group to a group that received the gold standard tx and to a control group. At first glance it looks like the level of symptomatic reduction is the same in the new tx and gold standard tx groups, but upon further inspection the psychologist notes that the new tx group has a large amount of variance. That is, some people saw significant sx reduction and others saw very minimal change. She needs to investigate this further. What is it that makes the new tx beneficial for some?

Question 13

Q

Standard scores

Answer

A

Part of: statistics and assessment

What: Standard scores are raw scores that are converted to z-scores that have a fixed mean and SD. Raw scores are converted into standard scores to make objective comparisons about the data; the mean z-score is always 0 and the SD is always 1.

EXAMPLE: Two clients are being treated with CBT for depression. The counselor wants to compare the baseline “severity” of depression. The clients took different measures of depression, the BDI and the QIDS (Quick Inventory of Depressive Symptomatology). The counselor converts their scores to standard scores, or z-scores, to compare the two.

Question 14

Q

Standard error of measurement

Answer

A

Part of: psychometrics and statistics

What: An index of the amount of error in a test or measure. The standard error of measurement is a standard deviation of a set of observations for the same test. It is essentially an estimate of how much an individual’s score would be expected to change on re-testing with same/equivalent form of test. A person’s “true score” is expected to fall somewhere within the confidence band created by the standard error of measurement.

Example: An IQ test has a mean of 100 for a particular sample, and a standard deviation of 14. A school counselor wants to know the range of the true score for someone whose score was 106. Calculating the standard error of measurement will provide the range that student’s “true score” is expected to fall within.

Question 15

Q

Reliability (types of)

Answer

A

Part of: psychometrics and research design

What: Reliability refers to the accuracy, dependability, consistency, or repeatability of test results. Reliability is a foundational characteristic of “psychometric soundness”

Types of reliability:

Inter-Rater Reliability examines the degree of consistency between different raters’ scorings, particularly in behavioral observation studies. An ethogram helps researchers create operational definitions to increase inter rater reliability.
-Correlation between those scores (Kappa statistic)

Test-Retest Reliability refers to the consistency of a measure when the same test is administered to the same person at different points in time.

Only works for stable traits
The interval between measurements must be considered: shorter intervals -> higher carryover
Be careful of developmental milestones

Parallel Forms Reliability compares scores on two different measures of the same quality

Internal Consistency Reliability examines the consistency of items within a test

Done via split-half, KR20, and Cronbach’s Alpha
Split-half is when test is split in half and the correlation between the two halves is examine

EXAMPLE: While developing a new version of an IQ test, researchers gave the test to the same group of subjects at several different times to evaluate the instrument’s test-retest reliability.

Question 16

Q

Projective test

Answer

Study These Flashcards

A

Part of: psychometrics, psychological testing

What: Test in which the stimulus, the required response, or both are ambiguous. The general idea behind projective tests is that a person’s interpretation of an ambiguous stimulus reflects his unique characteristics.

Most often personality tests
Have fallen out of favor in recent years.
Tests include the Rorschach inkblot test and the Thematic Apperception Test among others.
Usually these types of tests require extensive training and not a lot of evaluator agreement
Most fall flat when psychometric properties are examined i.e. low reliability low validity

EXAMPLE: You are seeing a client and you ask them to interpret a black ‘blob’ while using the Rorschach inkblot test. This is a projective test that suggests the client saying that she sees a crab in the image might be indicative of her mood at the time of testing.

Question 17

Q

Objective test

Answer

Study These Flashcards

A

Part of: psychometrics, psychological testing/assessment

What: Objective tests are more structured than projective tests. They use multiple choice, true/false, or Likert scale format, and are usually self-report. Answers are scored quantitatively, clearly stated questions and answers. No subjective element, therefore not influenced by rater variables.

EXAMPLE: The psychologist found that when assessing clients with Borderline Personality Disorder, objective tests of personality–such as the MMPI– were more valid in providing personality information than projective tests–such as the Rorschach Test– in which his own personal bias or judgment could hinder the test results thus affecting the reliability and validity of the measure. The objective tests were also easier to score.

Question 18

Q

Standardization sample

Answer

Study These Flashcards

A

Part of: psychometrics and psychological testing/assessment

What: When a test is constructed, a standardization sample is a group who has taken the test under standard conditions so that future test takers’ scores can be compared to the scores of the standardization sample. The authors of the Binet Intelligence Test used such a sample. It is crucial that the standardization sample be representative of the population for which the test will be used.

Example: A school principal wants to administer an IQ test to all of the fourth grade students at her school. She discards one potential test because the standardization sample used in the test’s creation was comprised of majority white children, and 80% of the students at her school are Black.

Question 19

Q

Cross validation

Answer

Study These Flashcards

A

Test an assessment on a particular sample. Look at distribution of sample and one way of telling if your sample is a valid representation of the population is to cross-validate it against a different sample. The process of evaluating a test or a regression equation for a sample other than the one used for the original studies. This process involves predicting performance in a group of subjects other than the ones to which the equation was applied. Then a standard error of measurement can be obtained for the relationship between the values predicated by the equation and the values actually observed.

EXAMPLE: The researchers created a new test to measure anxiety in children, and administered it to two new samples of children than the sample it was originally tested on. This cross-validation was necessary because chance and other factors such as cultural differences and SES may have influenced the original validation.

Question 20

Q

Clinical v. statistical significance

Answer

Study These Flashcards

A

Clinical significance refers to the meaningfulness of change in a client’s life. Statistical significance refers to the reliability of an outcome and is calculated mathematically. Generally in psychology, a result is statistically significant if the p-value is < .05, meaning that there is a less than 5% chance that the result is due to chance.

Example: In a randomized trial for a new drug to treat depression, results were not statistically significant. However, a small percentage of the participants found that they felt better than they had in months and were able to resume some everyday activities while taking the drug. Their results are clinically significant, since the difference in their lives was noticeable, despite the drug not meeting the bar for statistical significance.

Question 21

Q

Test bias

Answer

Study These Flashcards

A

A difference in test scores that can be attributed to demographic variables such as age, sex, and race. Tests are considered biased if a test design systematically disadvantages certain groups of people over others.

Example: A therapist develops a test for measuring disordered eating, however it is found to be biased as it assesses societal influences according to norms set by cisgender females.

PSYC 549- Psychometrics Flashcards

(21 cards)