Assessment & Testing (Appraisal) Flashcards

Question

**Face validity refers to the extent that a test** * *a. looks or appears to measure the intended attribute.** * *b. measures a theoretical construct.** * *c. appears to be constructed in an artistic fashion.** * *d. can be compared to job performance.**

Answer 1

most experts technically no longer list “face validity” as a sixth type of validity. Face validity—like a person’s face—merely tells you whether the test looks like it measures the intended trait. Face validity is not required test information according to the 1974 committee that drafted Standards for Educational and Psychological Tests. **a. looks or appears to measure the intended attribute.**

Answer 2

**a. have high criterion/predictive validity.** Here you are concerned that the test will measure an indepen- dent or external outside “criterion,” in this case the “future pre- diction” of the job performance. Choice “d” introduces you to the terms incremental validity and synthetic validity. Although incremental validity and synthetic validity are not considered two of the fi ve or six major types of validity, don’t be too surprised if they pop up on an advanced exam question.

Answer 3

**a. good concurrent validity.** Criterion validity could be “concurrent” or “predictive.” Con- current validity answers the question of how well your test stacks up against a well-established test that measures the same be- havior, construct, or trait. Evidence for reliability and validity is expressed via correlation coeffi cients. Suffi ce to say that the closer they are to 1.00 the better. The relationship or correlation of a test to an independent measure or trait is known as convergent validity. Convergent validity is actually a method used to assess a test’s construct/criterion va- lidity by correlating test scores with an outside source. The test also should show discriminant va- lidity. This means the test will not refl ect unrelated variables.

Answer 4

**d. predictive validity.** ## Footnote ``` The Graduate Record Examination (GRE), the Scholastic Apti- tude Test (SAT), the American College Test (ACT), and public ``` opinion polls are effective only if they have high predictive valid- ity, which is the power to accurately describe future behavior or events. Again the subtypes of criterion validity are concurrent and predictive.

Answer 5

**c. not always.** A reliable test is not always valid. Reliability, nonetheless, determines the upper level of va- lidity.

Answer 6

**b. always.** A valid test is always reliable. Choice

Answer 7

**a. test–retest reliability.** well-known test–retest method discussed here tests for “stabil- ity,” which is the ability of a test score to remain stable or fl uctu- ate over time when the client takes the test again. When using the test–retest paradigm the client generally takes the same test after waiting at least seven days. The test–retest procedure is only valid for traits such as IQ which remain stable over time and are not altered by mood, memory, or practice effects.

Answer 8

**b. equivalent or alternate forms reliability.** Here a single group of examinees takes parallel forms of a test and a reliability correlation coeffi cient is fi gured on the two sets of scores. Counterbalancing is necessary when testing reliability in this fashion. That is to say, half of the individuals get parallel form A fi rst and half get form B initially. This controls for vari- ables such as fatigue, practice, and motivation.

Answer 9

**b. was testing reliability via the split-half method.** In this situation the individual takes the entire test as a whole and then the test is divided into halves. The correlation between the half scores yields a reliability coeffi cient.

Answer 10

**d. interrater/interobserver.** What is Interscorer/Interrater/Interobserver Reliability? **An assessment of the correlation between two or more rater, observers or scorers**. The degree to which they would agree on the scoring of a test or interpretation of observed behaviors choice “d,” several raters assess the same performance. This method has been called “scorer reliability” and is utilized with subjective tests such as projectives to ascertain whether the scoring criteria are such that two persons who grade or assess the responses will produce roughly the same score.

Answer 11

**c. a perfect score which has no error** As stated earlier, this generally occurs only in physical measure- ment.

Answer 12

**b. .90.** Ninety percent of the score measured the attribute in question, while 10% of the score is indicative of error.

Answer 13

**a. 70% of the score is accurate while 30% is inaccurate.** Seventy percent of the obtained score on the test represented the true score on the personality attribute, while 30% of the ob- tained score could be accounted for by error. Seventy percent is true variance while 30% constitutes error variance.

Answer 14

**c. .80.** This is a tricky question. Although .70 is generally acceptable for most psychological attributes, for admissions for jobs, schools, and so on, it should be at least .80 and some experts will not settle for less than .90.

Answer 15

**d. 49%.** **_To demonstrate the variance of one factor accounted for by another you merely square the correlation_** (i.e., reliability coefficient). **So .70 × .70 = .49. .49 × 100 = 49%.** Your exam could refer to this principle as the coefficient of determination.

Answer 16

**c. intelligence quotient.** IQ testing has been the center of more heated debates among experts than any other type of testing.

Answer 17

**d. Francis Galton.** Galton felt intelligence was a single or so-called unitary factor.

Answer 18

**a. a unitary faculty.** exceptional mental abilities were genetic and ran in families, and said just that in his 1869 work Hereditary Genius. exceptional mental abilities were genetic and ran in families, and said just that in his 1869 work Hereditary Genius. Charles Spearman pos- tulated two factors—a general ability G and a specifi c ability S Fluid intelligence is fl ex- ible (terrifi c they both begin with an F),

Answer 19

**a. thoughts on convergent and divergent thinking.** Using factor analysis Guilford determined that there were 120 elements/abilities which added up to intelligence. Two of the dimensions—convergent and divergent thinking—are still pop- ular terms today. Convergent thinking occurs when divergent thoughts and ideas are combined into a singular concept. Diver- gent thinking is the ability to generate a novel idea.

Answer 20

**c. the Kuder-Richardson coeffi cients of equivalence.** Internal consistency or homogeneity of items also is known as “interitem consistency.” This can be done by using the Kuder-Richardson estimates, which are of- ten denoted on exams as the KR-20 or KR-21 formulas. Another statistic, Lee J. Cronbach’s alpha coeffi cient, also has been used in this respect. Cross- validation takes place when a researcher further examines the criterion validity (and in rarer instances, the construct validity) of a test by administering the test to a new sample.

Answer 21

**d. Alfred Binet and Theodore Simon.** Alfred Binet led the committee and the rest is history. By 1905, Binet, along with his coworker Simon, created a 30-question test with school-related items of increased diffi culty. in the United States in 1916, Lewis M. Terman of Stanford University published an American version of the Binet that was translated into English and adapted

Answer 22

**b. a standardized measure.** The Stanford-Binet is standardized because the scoring and ad- ministration procedures are formal and well delineated.

Answer 23

**d. MA/CA × 100.** This is a mathematical formula that is supposed to be a measure of a person's intelligence. When it was first created, it was defined as the ratio of mental age (MA) to chronological age (CA) multiplied by 100 (thus **IQ = MA/CA x 100**). Although we still use the term IQ, the Binet today actually relies on a standard age score (SAS) with a mean of 100 and a standard deviation of 16.

Answer 24

**c. 50% of the 9-year-olds could answer correctly.** A 9-year-old task was defi ned as one in which 1/2 of the 9-year- olds tested could answer successfully.

Answer 25

**c. discriminate normal from retarded Parisian children.**

Answer 26

**a. SAS.** SAS stands for “standard age score.”

Answer 27

**c. didn’t seem to be the best test for adults.** David Wechsler felt the Binet was slanted toward verbal skills and thus he added “performance” skills to ascertain attributes which might have been cultivated in a background which did not stress verbal profi ciency. The Wechsler yields a verbal IQ, performance IQ, and a full-scale IQ. The WAIS-III has 7 verbal scales and 7 performance scales.

Answer 28

**b. WAIS-III.** Choice “a,” the WPPSI or the Wechsler Preschool and Prima- ry Scale of Intelligence, is suitable for children ages 2 years 6 months to 7 years, three months. **_Choice “b,” the Wechsler Adult_** **_Intelligence Scale is intended for ages 16 and beyond_**. Choice “c,” the Wechsler Intelligence Scale for Children , is appropriate for kids 6 to 16 years, 11 months.

Answer 29

**c. WISC-IV.**

Answer 30

**a. WPPSI-III.**

Answer 31

**b. 100; 15 Wechsler, 16 Stanford-Binet.** IQs above 100 are above average and those shy of 100 are below average.

Answer 32

**a. group tests are quicker to administer.** World War I provided the impetus for the group testing move- ment. Approximately 2 million men were tested using the Army Alpha for literates and the Army Beta for illiterates and those from other countries. School districts, government, and industry prefer tests which can be administered to many individuals si- multaneously. The catch is that group tests are less accurate and have lower reliability.

Answer 33

**c. with the Army Alpha and Army Beta in World War I.** _Note the word group._

Answer 34

* *a. items are known to the subject regardless of his or her** * *culture.** The culture-fair test attempts to expunge items which would be known only to an individual due to his or her background. Key exam hint: New ACA ethics now consider it unethical to administer a test to a client from a given population un- less that particular test or inventory has been normed on that specifi c population! As an example, if you gave an African-American client a test that had not been normed on African Americans this would be considered a violation of ethics.

Answer 35

**c. Arthur Jensen.** **Jensen, choice “c**” mentioned earlier, sparked tremendous con- troversy—actually that’s putting it mildly—when he suggested in a 1969 Harvard Educational Review article (“How Much Can We Boost IQ and Scholastic Performance?”) that the closer people are genetically, the more alike their IQ scores. Adopted children, for example, will sport IQs closer to their biological parents than to their adopted ones. Jensen then leveled the charge that Whites score 11 to 15 IQ points higher than Blacks (regardless of social class). the African-American psychologist Robert Williams created the Black Intelligence Test of Cultural Homogeneity (BITCH) to demonstrate that Blacks often excelled when given a test laden with questions familiar to the Black community. Wil- liams charged that tests like the Binet and the Wechsler were part of “scientifi c racism.”

Answer 36

**d. a standardized personality test.** The MMPI-2 is suitable for those over 18. A sixth-grade reading level is required and testing time varies from 60 to 90 minutes. The test restandardization committee reported that the norming sample for the MMPI-2 is larger and more representative than the old measure. The original version of this instrument was created in 1940. The Minnesota Multiphasic Personality Inventory-2, the current ver- sion used since 1989, is known as a “self-report” personality in- ventory. The client can respond with “true,” “false,” or “cannot say” to 567 questions (10 more than the traditional MMPI which was the most researched test in history as well as the most useful for assessing emotional disturbance).

Answer 37

**b. any form of mental testing.** Psychometrics – coined from the Greek words for mental and measurement – refers to **the field in psychology devoted to testing, measurement, assessment and related activitie** **_Psychometrics literally refers to the branch of counseling or psychology which focuses on testing._** Choice “d” is used to describe answer scales in which various values are given to different re- sponses. For example, on a Likert Scale a “strongly agree” might be given a 5, yet an “agree response” might be rated a 4. The clients score is the “sum” of all the items.

Answer 38

**d. neutral stimuli.** The idea here is that the client will “project” his or her person- ality if given an unstructured task. **More specifi cally, there are** **several acceptable formats for projective tests**: First, **Associa-** **tion—**such as “What comes to mind when you look at this ink- blot?” Second, **Completion**—“Complete these sentences with real feelings”; third, **Construction**—such as drawing a person. * *_The theory is that self-report inventories like the MMPI do not_** * *_reveal hidden unconscious impulses. In order to accomplish this the client is shown vague, ambiguous stimuli such as a picture or an inkblot. Some counselors believe that by using projective measures a client will have more diffi culty faking his or her responses and that he or she will be able to expand on answers._** It should be noted that examiner bias is common when using projectives and a therapist using projectives needs more training than one who merely works with self-report tests.

Answer 39

**a. Raymond B. Cattell.** * *T_he 16 Personality Factor Questionnaire is suitable for persons_** * *_16 and above and has been the subject of over 2,000 papers or other communications_!** The test measures key personality factors such as assertiveness, emotional maturity, and shrewdness. A couple can even decide that each party will take the 16 PF, and an individual as well as a joint profi le will be compiled which can be utilized for marital counseling. Choice “c” is another Cat- tell, who coined the term mental test and spent time researching mental assessment and its relation to reaction time at the Uni- versity of Pennsylvania. James McKeen Cattell had originally worked with Wundt and later Galton. Tests and inventories like the 16 PF that analyze data outside of a given theory are called factor-analytic tests or inventories rather than theory-based tests.

Answer 40

**b. Carl Jung.** * *_the MBTI is based on Carl Jung’s analytic psychology. The_** * *_MBTI uses dichotomous types: extraversion versus introversion,sensing versus intuition, thinking versus feeling, and judging versus perceiving. The test results in a four-letter type score such as ISFJ (i.e., introversion, sensing, feeling, judging)._** Review question 654. Buros, mentioned in choice “d” of this question as well as the last, is noted for his Mental Measure- ments Yearbook, which was the fi rst major publication to review available tests. After his death, the University of Nebraska set up the Oscar K. Buros Center, which continued his valuable contri- bution to the fi eld.

Answer 41

**d. psychodynamic clinician.** Choices “a,” “b,” and “c” all refl ect positions that do not rely heavily on the unconscious mind (especially the behaviorists who believe that if you can’t directly measure the behavior, it is not meaningful). **Some theorists (e.g., Allport) would contend** **that even if it is true that unconscious impulses exist, they are** **not very important.**

Answer 42

**b. potential; what has been learned.** An aptitude test assesses “potential” and “predicts.” An achieve- ment test examines what you know (e.g., the NCE). Predictive validity is particularly important when choosing an apti- tude test.

Answer 43

**c. pictures.** The TAT consists of 30 cards plus one blank card. The test, which is intended for ages 4 and beyond, uses up to 20 cards when administered to any given individual (i.e., 19 selected to fi t the age and sex of the client, plus one blank card). **_The pictures_** **_on each card are intentionally ambiguous, and the client is asked to make up a story for each of them._**

Answer 44

**a. a test being normed solely on White middle-class clients.** ## Footnote This bias should be communicated to the client when the results are explained.

Answer 45

**a. Bender Gestalt.** **_The Bender Visual Motor Gestalt Test (named after Lauretta_** **_Bender)_** is actually an expressive projective measure, though fi rst and foremost it is known for its ability to discern whether brain damage is evident. Suitable for ages 4 and beyond the cli- ent is instructed to copy nine geometric fi gures which the client can look at while constructing his or her drawing.

Answer 46

**c. an eighth-grade male with an IQ of 136.** Interest inventories work best with individuals who are of high- school age or above inasmuch as interests are not extremely sta- ble prior to that time. **_Interests become quite stable around age_** **_25._**

Answer 47

* *c. they emphasize professional positions and minimize blue-** * *collar jobs.** Also take note of the fact that contrary to popular opinion interests and abilities are not—that’s right, not—highly correlated. A client, for example, could have tremendous mu- sical ability yet could thoroughly dislike being a musician.

Answer 48

**a. they are reliable and not threatening to the test taker.** Generally, an interest inventory would be the least threatening variety of test.

Answer 49

**b. AMECD.** This ACA Division is the _Association for Measurement and_ _Evaluation in Counseling and Development._

Answer 50

**d. an achievement test.** The NCE is testing your knowledge and application of material in the counseling profession.

Answer 51

**a. GATB, the O\*NET Ability Profi ler, and the MCAT.** ## Footnote **_Exam Hint: School selection tests assess aptitude._**

Answer 52

**b. social desirability (the right way to feel in society).** The converse of choice “b” occurs when an individual purposely, or when in doubt, gives unusual responses. **This phenomenon is** **known as “deviation.”**

Answer 53

**c. GRE.** ,the GRE attempts to predict graduate school perfor- mance, but it also tests your level of knowledge. Some exams will refer to tests like the GRE, MAT, MCAT, SAT, etc., as “ap- titude-achievement tests.” Say your exam presents you with one of the aforementioned tests and gives you “aptitude” as one choice, and “achievement” as another, but does not give you “aptitude achievement” as an alternative (yipes!). Well, I certainly won’t condone the practice, but based on my investigation of the text- book taxonomy of tests I’d opt for the “aptitude” option

Answer 54

**d. all of the above.** Moreover, it has been discovered that if the counselor involves the client in the process of test selection it will improve his or her cooperation in the counseling process.

Answer 55

**a. how accurate or inaccurate a test score is.** ***The standard error*** (***SE***) of a statistic is ***the standard deviation*** of its sampling distribution or an estimate of that ***standard deviation***. If a client decided to take the same test over and over and over again you could plot a distribution of scores. This would be the standard error of measurement for the instrument in question. Suffice it to say, the lower the better.

Answer 56

**c. Tom will score between 103 and 109.** Calculated simply by taking: 106−3=103 and 106+3=109. Hint: Your exam could refer to this as the 68% confi dence interval (i.e., 103 to 109). Classical test theory suggests the formula, X=T+E, where X is the obtained score, T is the true score, and E is the error. Hence, psychometricians know that if a client takes the same test over and over, random error (i.e., E in the formula) will cause the score to fl uctuate.

Answer 57

**d. be lower than .82.** Increasing a test’s length raises reliability. Shorten it and the antithesis occurs. Note: The Spearman Brown formula is used to estimate the impact that lengthening or shortening a test will have on a test’s reliability coeffi cient.

Answer 58

**b. DSM or ICD.** ## Footnote Diagnosis is a medical term which asserts that you classify a disease based on symptomatology.

Answer 59

**b. high reliability.** This is known as “interrater” reliability. Interrater reliability refers to **the extent to which two or more individuals agree**.

Answer 60

**a. clients often give inaccurate answers.** Say a client is monitoring her behavior and does not wish to dis- appoint her therapist. The report could be biased. This is a “re- active effect” of the self-monitoring.

Answer 61

**b. A clinical psychologist.** Generally, a clinical psychologist would have the most training in this area while the social worker would have the least education regarding tests and measurements.

Answer 62

**c. an informal assessment technique.** Self-reports, case notes, checklists, sociograms of groups, in- terviews, and professional staffi ngs would also fall into this category.

Answer 63

**c. read the test manual included with the test.** The manual should specify the target population for the test in question.

Answer 64

**c. read the test manual included with the test.** The manual should specify the target population for the test in question.

Answer 65

**d. a test is merely a single source of data and not infallible.** Although the fi rst three choices are important to the counselor, the fi nal statement should be explained to the client. An ex- tremely high score—say on a mechanical aptitude test—does not automatically imply that the client will prosper as a mechanic.

Answer 66

**a. computer-assisted testing and computer interpretations.**

Answer 67

* *a. a greater reliance on tests while others want to rely on** * *them less.**

Answer 68

**c. more public education is needed in the area of testing.** Again, the public needs to know the limitations of testing (i.e., that they are fallible). If you’ve been doing counseling for any length of time then you’ve surely come in contact with clients who have been harmed by hearing a score (e.g., their IQ) and then reacting to it such that it becomes a negative, self-fulfi lling prophecy.

Answer 69

**d. A checklist.** ## Footnote * *a client to do her checklist or diary one way and you would go** * *about it in a totally different manner.**

Answer 70

**a. approximately 68% would score between 85 and 115.** * Wechsler IQ test has been administered to a **very large group of people** so chances are the distribution of scores will be **normal.** * **the mean score will be 100 (i.e., the average** **IQ)** * **the standard deviation will be 15 (if the question were** **asked about the Binet you’d use 16 as the standard deviation).** * **In a normal distribution approximately 68% of the population will fall between plus/minus 1 standard deviation of the mean.** * **a standard deviation of 15 you simply subtract 15 from 100 to get the low score (i.e., 85) and add 15 to 100 to get 115.** *

Answer 71

**d. a projective test.** Although it is rare, some texts and exams take issue with the archa- ic word projective and refer to such tests as “self- expressive.”

Answer 72

**b. more unreliable than those given later in life.** These “toddler tests” are sometimes capable of picking up gross abnormalities such as serious mental retardation.

Answer 73

**b. never generalize on the basis of a single test score.**

Answer 74

**d. .25.** In most tests the level is set at .5 (i.e., 50% of the examinees will answer correctly while 50% will not). However, in this case the .25 level would allow you to ferret out the lower 75% you do not wish to admit.

Answer 75

**d. all of the above.** Persons over 18 can inspect their own records and those of their children. The Family Education Rights and Privacy Act also stipulates that information cannot be released without adult consent.

Answer 76

**d. all of the above.** Persons over 18 can inspect their own records and those of their children. The Family Education Rights and Privacy Act also stipulates that information cannot be released without adult consent.

Answer 77

**d. Americanized the Binet.** Since Terman was associated with Stanford University the test became the Stanford-Binet.

Answer 78

**c. 1.0.** The item diffi culty index is calculated by taking the number of persons tested who answered the item correctly/total number of persons tested. Hence, in this case 75/75=1.0. This maximum score for item 12 tells you it is probably much too easy for your examinees.

Assessment & Testing (Appraisal) Flashcards

(102 cards)