PSYC 549- Psychometrics Flashcards

Question 1

Q

Achievement Test

Answer

A

A test that is designed to measure an individual’s level of knowledge in a particular area; generally used in schools and educational settings. Unlike an aptitude test, which measures a person’s ability to learn something, an achievement test focuses specifically on how much a person knows about a specific topic. It measures an individual’s previous learning.

Example: The comps exam is an achievement test, as it is designed to measure how thoroughly clinical counseling students have learned the information in the ten core classes of the program.

Question 2

Q

Aptitude test

Answer

A

Measures a person’s potential to learn or acquire specific skills. Often used for measuring high school students’ potential for college. Aptitude tests are prone to bias.

Example: The SAT is an aptitude test designed to predict a student’s potential success in college. There is reason to doubt the predictive validity of the SAT.

Question 3

Q

Construct

Answer

A

Part of: psychometrics and psychological testing.

What: Constructs are developed to measure complex, abstract concepts that are indirectly observed. They are based on a characteristic which is not directly observable, and is an internal event or process that must be inferred from external behavior. Constructs may be derived from theory, research, or observation.

Example: The counselor administered a paper and pencil assessment measure that solicited responses related to fidgeting, excessive worrying, difficulty concentrating - all representing the construct of anxiety. Anxiety is a construct that can’t be directly observed but can be measured indirectly or inferred by assessing the prevalence of these bxs.

Question 4

Q

Correlation v. causation

Answer

A

Correlation refers to a relationship between variables, but does not imply causation (i.e. that one variable causes change in the other). Mediating variables may explain the relationship, or the relationship can be bidirectional (thus both would be causal). Causality can only be determined under experimental conditions. An experiment requires random assignment of participants and manipulation of at least one independent variable.

Example: An observational study of third graders found a positive correlation between students who ate breakfast and test scores. The researchers cannot conclude whether eating breakfast causes students to test better, whether students with higher test scores are more likely to eat breakfast, or whether there is some other variable contributing to the relationship.

Question 5

Q

Criterion-referenced scoring/tests

Answer

A

Part of: psychometrics and testing

What: Criterion-referenced tests evaluate a specific skill/ability/task that the test taker must demonstrate. Scores are compared to a pre-set criterion score, not compared to a norm or other scores.
Example: The No Child Left Behind Act required state testing of students to use a criterion-referenced scoring model. Students were expected to meet certain benchmark scores rather than evaluate their performance against their peers.

Question 6

Q

Criterion-related validity

Answer

A

Part of: psychometrics

What: Extent to which a test corresponds with a particular criterion (standard against which test is compared). Typically used when objective is to predict future performance on an unknown criterion.

Predictive criterion validity refers to a test or measure that predicts future performance/success in relation to a particular criterion (SAT -> success in college).

Concurrent criterion validity refers to a concurrent measure taken at same time as test (work samples -> job performance; written driving test > physical driving test).

Example: Applicants to a software company are given a test of job aptitude. After six months working at the company, job performance is evaluated. The two scores are compared to asses the criterion validity of the job aptitude test.

Question 7

Q

Validity (types of)

Answer

A

What: A psychometric property. Validity is the extent to which an assessment or study measures the construct intended to measure; a validity coefficient of 0.3-0.4 is considered adequate.

Types of validity:

Face validity: based on logical analysis rather than statistical analysis, face validity is the appearance that a test measure what it purports to at a surface level.

Content validity: evidence that the content of a test adequately represents the conceptual domain it is designed to cover; test items are a fair sample of the total potential content and relevant to construct being tested. Based on logical analysis v. statistical analysis.

Criterion validity: extent to which a test corresponds with a particular criterion against which it is compared; indicated by high correlations between a test and a well-defined measure. For example, you might examine a suicide risk scale and suicide rates/attempts or driver skill test and number of infractions on the road.

Construct validity: the degree to which the test measures the construct or trait it intends to measure. This may be demonstrated with convergent evidence (when there is a high correlation between two or more tests that purport to assess the same criterion) or discriminant evidence (when two tests of unrelated constructs have low correlations; that is, they discriminate between two qualities that are not related to each other).

Example: A business psychologist has developed a new IQ-test that requires only 5 minutes per subject, as compared to 90 min for the test acknowledged as the gold standard. He administers both tests to a sample population and then compared scores. The correlation between the two sets of scores was high indicating that the new test has construct validity.

Question 8

Q

Standard deviation

Answer

A

Part of: statistics and measurement techniques

What: the average amount that scores differ from the mean score of a distribution. Found by taking the square root of the variance, which is the average squared deviation around the mean. Gives an approximation of how much a typical score is above or below the average score.

In general, smaller SD = scores closer to mean; larger SD = larger distribution/spread

Example: A counselor administered a standardized test for depression to her client and found she scored two standard deviations above the mean on the assessment. This indicates she is in the 98th percentile for symptoms of depression according to this scale and well above the mean.

Question 9

Q

Norm referenced scoring/tests

Answer

A

Part of: psychometrics and assessment

What: A norm referenced test evaluates a test taker’s performance against a standardized sample; typically used for the purpose of making comparisons with a larger group.

Norms should be current, relevant, and representative of the group to which the individual is being compared. Can be problematic when tests are not normed with a culturally diverse population.

Example: The child psychologist tested the adolescent’s IQ and discovered that the child’s IQ was 165, placing him in the 99th percentile, more than 3 SDs above the mean on the normal curve because IQ is normally distributed. IQ testing is an example of norm referenced scoring/testing because an individual’s score is always interpreted in terms of typical performance/results.

Question 10

Q

Assessment interview

Answer

A

Part of: psychometrics and clinical practice

What: An initial interview conducted for the purpose of gathering information to answer a referral question such as, “Does this child have Autism Spectrum Disorder”?

An assessment interview differs from a clinical interview in that once the referral question is answered, the therapist will likely refer the client elsewhere. In a clinical interview, the counselor would gather information about the patient and begin to form a conceptualization of their case and presenting problems to establish the therapeutic relationship and treatment. An assessment interview is more likely to include standardized tests (intelligence, personality, paper-pencil) and talk to multiple sources.

May be structured, in which case a certain order of questions and format is strictly adhered to, or may be unstructured, in which the interviewer is free to follow their own course of questioning.
Structured interviews are generally more reliable and valid, but lack the freedom of unstructured interviews to pursue a topic of interest or follow an instinct

EXAMPLE: Lillian’s classroom behavior leads her teacher and school counselor to believe she may have ADHD. The counselor refers Lillian to a child psychologist who specializes in ADHD. The psychologist conducts an assessment interview including a variety of tests and interviews with the teacher, parents, and school counselor to determine if an ADHD diagnosis is appropriate.

Question 11

Q

Normal curve

Answer

A

Part of: statistics and research

What: A normal curve is the bell-shaped curve created by a normal distribution of a population with symmetry around central tendencies. Random sampling tends to produce a normal curve. Most statistical procedures in psychology assume normally distributed scores. Parametric stats are based on normal distributions.

EXAMPLE: The child psychologist tested the adolescent’s IQ and discovered that the child’s IQ was 165, placing him in the 99th percentile, more than 3 standard deviations above the mean on the normal curve because IQ is normally distributed.

Question 12

Q

Variance

Answer

A

Part of: statistics and data analysis

What: variance is a measure of the variation of or differences among subjects in a distribution across the measure X. Variance arises from natural, random differences among subjects or from environmental variations, measurement error, or researcher error.

To calculate variance, square the deviation around the mean. It must be squared because sum of deviations around mean would always equal zero.

EXAMPLE: A clinical psychologist is doing research on a new tx for substance use disorders. She conducts an experiment in which she compares the tx group to a group that received the gold standard tx and to a control group. At first glance it looks like the level of symptomatic reduction is the same in the new tx and gold standard tx groups, but upon further inspection the psychologist notes that the new tx group has a large amount of variance. That is, some people saw significant sx reduction and others saw very minimal change. She needs to investigate this further. What is it that makes the new tx beneficial for some?

Question 13

Q

Standard scores

Answer

A

Part of: statistics and assessment

What: Standard scores are raw scores that are converted to z-scores that have a fixed mean and SD. Raw scores are converted into standard scores to make objective comparisons about the data; the mean z-score is always 0 and the SD is always 1. (Observed score - mean/standard deviation).

EXAMPLE: Two clients are being treated with CBT for depression. The counselor wants to compare the baseline “severity” of depression. The clients took different measures of depression, the BDI and the QIDS (Quick Inventory of Depressive Symptomatology). The counselor converts their scores to standard scores, or z-scores, to compare the two.

Question 14

Q

Standard error of measurement

Answer

A

Part of: psychometrics and statistics

What: The standard deviation of errors of measurement in a test. An estimate of how much an individual’s score would be expected to change on re-testing with the same/equivalent form of a test. Measures how repeated measures of a person on the same instrument tend to be distributed around his or her “true” score. Based on Classical Test Theory which posits that error is normally distributed. Used to calculate confidence intervals around a test’s scores. As the reliability of a test increases, the SEM decreases.

Example: A researcher develops a test to measure depression, then administers it to a sample. They want to analyze the data that they gathered using statistics. They calculate the SEM and it turns out to be low which indicates that the measurement is fairly precise. They then decide to carry out further statistical analysis.

Question 15

Q

Reliability (types of)

Answer

A

Part of: psychometrics and research design

What: Reliability refers to the dependability, consistency, or repeatability of test results. Reliability is a foundational characteristic of “psychometric soundness”

Types of reliability:

Inter-Rater Reliability examines the degree of consistency between different raters’ scorings, particularly in behavioral observation studies. An ethogram helps researchers create operational definitions to increase inter-rater reliability.
-Correlation between those scores (Kappa statistic)

Test-Retest Reliability refers to the consistency of a measure when the same test is administered to the same person at different points in time.

Only works for stable traits
The interval between measurements must be considered: shorter intervals -> higher carryover
Be careful of developmental milestones

Parallel Forms Reliability compares scores on two different measures of the same quality

Internal Consistency Reliability examines the consistency of items within a test

Done via split-half, KR20, and Cronbach’s Alpha
Split-half is when test is split in half and the correlation between the two halves is examine

EXAMPLE: While developing a new version of an IQ test, researchers gave the test to the same group of subjects at several different times to evaluate the instrument’s test-retest reliability.

Question 16

Q

Projective test

Answer

Study These Flashcards

A

Part of: psychometrics, psychological testing

What: Test in which the stimulus, the required response, or both are ambiguous. The general idea behind projective tests is that a person’s interpretation of an ambiguous stimulus reflects his unique characteristics.

Most often personality tests
Have fallen out of favor in recent years.
Tests include the Rorschach inkblot test and the Thematic Apperception Test among others.
Usually these types of tests require extensive training and not a lot of evaluator agreement
Most fall flat when psychometric properties are examined i.e. low reliability low validity

EXAMPLE: You are seeing a client and you ask them to interpret a black ‘blob’ while using the Rorschach inkblot test. This is a projective test that suggests the client saying that she sees a crab in the image might be indicative of her mood at the time of testing.

Question 17

Q

Objective test

Answer

Study These Flashcards

A

Part of: psychometrics, psychological testing/assessment

What: Objective tests are more structured than projective tests. They use multiple choice, true/false, or Likert scale format, and are usually self-report. Answers are scored quantitatively, clearly stated questions and answers. No subjective element, therefore not influenced by rater variables.

EXAMPLE: The psychologist found that when assessing clients with Borderline Personality Disorder, objective tests of personality–such as the MMPI– were more valid in providing personality information than projective tests–such as the Rorschach Test– in which his own personal bias or judgment could hinder the test results thus affecting the reliability and validity of the measure. The objective tests were also easier to score.

Question 18

Q

Standardization sample

Answer

Study These Flashcards

A

Part of: psychometrics and psychological testing/assessment

What: AKA Norm Group, When a test is constructed, a standardization sample is a group of test-takers who serve to represent the population to which all future test takers’ scores can be compared. Standardization samples should be large and representative of the population which they intend to represent.

Example: A school principal wants to administer an IQ test to all of the fourth grade students at her school. She discards one potential test because the standardization sample used in the test’s creation was comprised of majority white children, and 80% of the students at her school are Black.

Question 19

Q

Cross validation

Answer

Study These Flashcards

A

The process of evaluating a test or a regression equation for a sample other than the one used for the original studies. This process involves predicting performance in a group of subjects other than the ones to which the equation was applied. Then a standard error of measurement can be obtained for the relationship between the values predicated by the equation and the values actually observed.

EXAMPLE: The researchers created a new test to measure anxiety in children, and administered it to two new samples of children than the sample it was originally tested on. This cross-validation was necessary because chance and other factors such as cultural differences and SES may have influenced the original validation.

Question 20

Q

Clinical v. statistical significance

Answer

Study These Flashcards

A

Clinical significance refers to the meaningfulness of change in a client’s life. Statistical significance refers to the reliability of an outcome and is calculated mathematically. Generally in psychology, a result is statistically significant if the p-value is < .05, meaning that there is a less than 5% chance that the result is due to chance.

Example: In a randomized trial for a new drug to treat depression, results were not statistically significant. However, a small percentage of the participants found that they felt better than they had in months and were able to resume some everyday activities while taking the drug. Their results are clinically significant, since the difference in their lives was noticeable, despite the drug not meeting the bar for statistical significance.

Question 21

Q

Test bias

Answer

Study These Flashcards

A

Part of: psychometrics

What: a systematic error in the measurement due to a flaw in the test itself. Test bias contributes to differences in test scores that can be attributed to identity variables such as age, sex, and race. Tests are considered biased if a test design systematically disadvantages certain groups of people over others. Test bias can occur due to a non-representative or small standardization sample.

Example: A therapist develops a test for measuring disordered eating, however, it is found to be biased as it assesses societal influences according to norms set by cisgender females.

Example: EXAMPLE: An African American high school student comes to therapy after having been diagnosed with Social Anxiety Disorder (SAD) by a school psychologist. The school psychologist administered a Fear Questionnaire to him after he was referred to her by teachers who said the student seemed very nervous in class and did not interact with others. He would also miss classes. The student’s mother suggested he go to a therapist with a multicultural background. The student had just started high school at a school where he was one of the only people of color. The therapist decided to further assess the student for SAD because she knows there is a possibility of test bias in assessments that do not account for differences in experiences of multicultural individuals.

PSYC 549- Psychometrics Flashcards

(21 cards)