Assessment and Testing Flashcards

Question

What is convergent validity?

Answer 1

This is a method used to assess a test's construct/criterion validity by correlating test sores with an outside source (I.e. seeing if someone with a known phobia has test results that indicate the phobia).

Answer 2

This means that the test will not reflect unrelated variables. So if phobias are unrelated to IQ, there should not be a correlation between someone's phobia and IQ tests. When a researcher is engaged in test validation, both convergent and discriminatory validity should be examined.

Answer 3

Yes, a valid test is always reliable. BUT a reliable test is not always valid.

Answer 4

This is a method for testing reliability in which you give the same test to the same group of people two times and then correlate scores. This method tests for **stability*****,*** which is the ability for a test score to remain stable or fluctuate if the client takes the test again. This method is generally only valid for traits like IQ that remain stable over time.

Answer 5

This is when a single group of examinees takes parallel forms of a test and the researcher figures out a reliability coefficient based on the two sets of scores. (I.e. one group takes two tests that are designed to be equivalent and then scores are compared). Doing this well requires **counterbalancing** which means you split the group and one half gets Test A first and the other half gets Test B first – this controls for things like fatigue, practice, and motivation.

Answer 6

In this situation, the individual takes the entire test as a whole and then the test is divided in half. The correlation between the half scores yields.a reliability coefficient. But this only works if the researcher splits it using random numbers or even/odd numbers (vs. first and second half of the test) because it must account for practice and fatigue.

Answer 7

This is when several raters assess the same performance. This method is also called **scorer reliability** and is utilized with subjective tests like projectiles to ascertain whether the scoring criteria are such that two people who grade or assess the responses will get roughly the same score.

Answer 8

This would indicate a perfect score and generally only occurs in physical measurement. An excellent psychological or counseling test would have a reliability coefficient of .90 which indicates that 90% of the score measured the attribute in question and 10% of the score is indicative of error. A personality test typically has a reliability coefficient around .70 (70% of the score is accurate and 30% is inaccurate). Although .70 is generally acceptable for psychological attributes, admissions for jobs, schools, etc should be at least .80 and some experts will not settle for less than .90

Answer 9

This is when you have to determine the variance of one factor accounted for by another. To do that, you merely square the correlation (I.e. reliability coefficient). So if the correlation between two instances of a test to the same population (test-retest) is .70, you would square that to get .49 which would be the coefficient of determination.

Answer 10

Intelligence quotient. The early ratio formula for the Binet IQ score was Mental Age/chronological age x 100. The score indicated how you compared to others in your age group. IQ testing has been the subject of heated debate.

Answer 11

Sir Francis Galton of England has been recognized as one of the major pioneers in the study of individual differences. He believed that exceptional mental abilities were genetic and ran in families. He did research and concluded that intelligence was normally distributed like height and weight and that it was primarily genetic. He felt that intelligence was a single or so-called unitary factor.

Answer 12

In 1904, he postulated two factors that were thought to be applicable to any mental task: a general ability **G** and a specific ability **S** which were thought to be applicable to any mental task.

Answer 13

**Fluid intelligence** is flexible, culture-free, and adjusts to the situation. **Crystallized intelligence** is rigid and doesn't change or adapt.

Answer 14

JP Guilford isolated 120 factors which added up to intelligence. Two of the dimensions – **convergent** and **divergent thinking** - are still popular terms today. **Convergent thinking** occurs when divergent thoughts and ideas are combined into a singular concept. **Divergent thinking** is the ability to generate a novel idea.

Answer 15

Alfred Binet and Theodore Simon. In 1904, the French government appointed a committee to distinguish between normal and “feeble-minded” Parisian children so that the kids with an intellectual disability could be taught separately. Binet let the committee and, by 1905, he and his coworker, Theodore Simon, had created a 30-item test with school-related items of increased difficulty. Binet used his own daughters as test subjects and is also cited as a pioneer in projective testing based on his work with inkblots. In 1916, after testing nearly 3,000 children, **Lewis M Terman** of Stanford published an American version of the Binet that was translated into English – they added “Stanford” into the name and it became the **Stanford-Binet IQ test.**

Answer 16

This is a method used to establish inter-item consistency, whether each item on the test is measuring the same thing as every other item. The Kuder-Richardson reliability/item consistency estimates, which are often denoted as KR-20 or KR-21, measure exactly this. Though the split-half method measures internal consistency reliability, it does not do it on a per-item basis.

Answer 17

Inter-item consistency, also known as **internal consistency** or **homogeneity of items** means that someone wants to find out if each item on a test is measuring the same thing as every other item. Is performance on one item truly related to performance on another? This can be done by using the Kuder -Richardson reliability/item consistency estimates ( also known as KR-20 or KR-21 formulas)

Answer 18

Cross-validation takes place when a researcher further examines the criterion validity (and in rare cases, the construct validity) of a test by administering the test to a new sample. This procedure is necessary to ensure that the original validity coefficient is a guard agains error factors which are likely to be present if the original sample size is small. In most cases, a cross-validation coefficient is indeed smaller than the initial validity coefficient. This phenomenon is called **shrinkage.**

Answer 19

A standardized measure is one in which the scoring and administration procedures are formal and well delineated. Measures that are not standardized lack procedural guidelines for scoring or administration and do not include quantitative information relating to standards of performance. The Stanford-Binet IQ test is an example of a standardized test.

Answer 20

MA (Mental age)/CA (chronological age) X 100 The test is Binet's, but the famous formula was created by German psychologist William Louis Stern. The formula produced what is known as a **ratio IQ.** Today a “deviation IQ” is used which compares the individual to a norm (I.e. to others in their age group). Although we still use the term IQ, the Binet today actually relies on a **standard age score (SAS)** with a mean of 100 and a standard deviation of 16. So IQ today is no longer really a quotient.

Answer 21

That would be something that 50% of 9 year olds tested could answer correctly.

Answer 22

Some experts believe that the Stanford-Binet is a more accurate test for assessing extremes of intellect (high and low) while the Weschler is a better test for those who fall in the average range.

Answer 23

This is a popular family therapy/systems theory term that means that dysfunctional families are either too open or too closed (I.e. letting too much information in or not enough information in). The healthy family is said to be in a balanced state known as **negative entropy.**

Answer 24

The Binet ultimately didn't seem to be the best test for adults. Wechsler felt that the Binet was slanted towards verbal skills and thus, on Wechsler's test, he added “performance skills” to ascertain certain attributes which might have been cultivated in a background that did not stress verbal proficiency.

Answer 25

* **The WPPSI -** This is the Wechsler Preschool and Primary Scale of intelligence that is suitable for kids from 2.5 years = 7 years, 7 months. * **The WISC-IV -** This is the Wechsler Intelligence Scale for Children and is appropriate for kids afters 6-16 and 11 months * **The WAIS-IV -** This is the Wechsler Adult Intelligence Scale and is intended for ages 16 - 90. The WAIS-IV and the WISC-IV no longer provide verbal and performance IQ scores.

Answer 26

This is the Wechsler Adult Intelligence Scale, intended for people ages 16-90. * It is based on neurocognitive research and the Cattell-Horn-Carroll leading theory of human intelligence. * It can be administered and scored online and takes 60-90 minutes to complete. * The test used to have object assembly and picture arrangement sections, but those have been dropped. * It is made up of 10 subject areas also called **subtests** which make up 4 **index scores:** verbal comprehensive index, perceptual reasoning index, working memory index, and processing speed index * **FISQ** stands for **full scale IQ.** It has a mean of 100 with a standard deviation of 15 * There is less emphasis than before on crystallized intellience * This test measures IQ from 40-160. The Stanford-Binet, however, has a wider range so is a better measure for extremely low IQs or giftedness

Answer 27

On any given test, the **floor** refers to the lowest possible score and the **ceiling** is the highest possible score.

Answer 28

This is a personality inventory based on Carl Jung's analytic psychology. The MBTI uses dichotomous types (extraversion vs. introversion, sensing vs. intuition, thinking vs. feeling and judging vs. perceiving). The test results in a 4 letter score like INTJ. When a test is guided by theory like this one is, it is known as a **theory-based test or inventory.**

Answer 29

Group tests are quicker to administer. School districts, government, and industry often prefer tests which can be administered to many people simultaneously. The catch, however, is that group tests are less accurate and have lower reliability.

Answer 30

With the Army Alpha and the Army Beta during World War II

Answer 31

This is a test in which items are known to the subject regardless of his or her culture. The culture-fair test attempts to expunge items which would be known only to an individual due to his or her background. Ethics now consider it unethical to administer a test to a client from a given population unless that particular test or inventory has been normed on that specific population. For example, if you gave a Black client a test that had not been normed for Black people, this would be considered to be a violation of ethics.

Answer 32

Arthur Jensen sparked tremendous controversy when he suggested in a 1969 article that the closer people are genetically, the more alike their IQ scores will be. Jensen then leveled the charge that white people sore 11-15 IQ points higher than African Americans regardless of social class. His theory stated that due to slavery, it was possible that African Americans were bred for strength rather than intelligence. Urie Bronfenbrenner claimed that Jensen relied on twin studies with poor internal validity. Others felt that genetic influences contributed less than 50% to IQ.

Answer 33

John Ertl claimed he invented an electronic machine to analyze neural efficiency and take the place of a paper and pencil IQ test. The device relies on a computer, an EEG, a strobe light, and an electrical helmet. The theory is the faster one processes the perception, the more intelligence he has. Most counselors don't buy this idea.

Answer 34

He is related for the concepts of **fluid intelligence** (inherited neurological intelligence that decreases with age and isn't dependent on culture) and **crystallized intelligence** (intelligence from experiential, cultural, and educational interaction). Crystallized intelligence is measured by tests that focus on content. Fluid intelligence has been called “content-free reasoning” such as a block design or analogy problem.

Answer 35

Robert Williams is an African American psychologist who created the Black Intelligence Test of Cultural Homogeneity (BITCH) to demonstrate that Black people often excelled when given test laden with questions that would be familiar to Black people. Williams charged that tests like the Binet and the Wechsler were part of scientific racism. Wiliams, a victim of the system himself, scored an 82 on an IQ test when he was 15 years old and a counselor suggested bricklaying because he was good with his hands. Williams rejected the advice and went on to get a PhD. IQ tests, though controversial, are often excellent predictors of school success since schools emphasize values that have been heavily influenced by European culture.

Answer 36

In this now oft-quoted court battle, it was initially ruled that IQ tests were racially biased against African American children who were overly represented in EMR (educable mentally retarted) classes (proper terminology at the time) based on IQ scores.

Answer 37

The MMPI-2 is a standardized personality test, known as a “self-report” personality inventory. The client can respond true or false to 567 questions. The new MMPI-2 is intended to help clinicians diagnose and treat patients – and is an updated version of the MMPI that also attempted to eliminate sexist language. The MMPI is suitable for people over 18 and requires a 6th grade reading level.

Answer 38

In a projective test, the client is shown neural stimuli. The idea here is that the client will project his or her personality if given an unstructured task. More specifically there are several acceptable formats for projective tests: * **association -** I.e. “what comes to mind when you look at this inkblot” * **completion -** “complete these sentences with real feelings” * **construction -** I.e. drawing a person The theory is that self-reports like the MMPI do not reveal hidden unconcsciou impulses. In order to accomplish this, the client is shown vague, ambiguous stimuli like an inkblot. Some believe that by using projective measures a client will have more difficulty faking his or her responses and that he or she will be able to expand on answers. Examiner bias is common when using projective – a therapist using projective measures needs more training than someone who only works with self-report tests.

Answer 39

The 16PF is the **16 Personality Factor Questionnaire** and was developed by **Raymond B. Cattell**. The test is suitable for people over 16 and measures key personality factors like assertiveness, emotional maturity, and shrewdness. A couple can even decide that they will each take the 16PF and them both individual and joint profiles can be compiled for use in marital counseling. Tests and inventories like the 16PF which analyze data outside of a given theory, are called **factor-analytic tests** or **inventories** rather than **theory-based tests.**

Answer 40

He created the *Mental Measurements Yearbook* which was the first publication to ever review available tests. The University of Nebraska then set up a center to continue to produce MMY books to help counselors pick appropriate tests.

Answer 41

Psychodynamic. Projective measures try to access the unconscious mind and unconscious impulses.

Answer 42

Predictive validity – because the test is supposed to measure someone's potential.

Answer 43

The TAT consists of 31 cards with pictures of them. The test can be give to people 4 and up and uses up to 20 of those cards (19 selected to fit the age and sex of client + one blank card). The pictures on each card are intentionally ambiguous and the client is asked to make up a story for each of them.

Answer 44

This is a projective test in which the subject completes an incomplete sentence with a real feeling.

Answer 45

Test bias often comes from the test being normed solely on white middle-class clients.

Answer 46

This is an expressive progressive measure which first and foremost is known for its ability to discern whether or not brain damage is evident. It is suitable for 4+ and asks the client to copy 16 geometric figures which they can look at while constructing their drawing.

Answer 47

Interest inventories work best with individuals who are of high school age or older, as interests are not always that stable before then. Interests become quite stable around age 25.

Answer 48

* They emphasize professional positions and minimize blue collar jobs * interests and abilities are not actually highly correlated. I.e. a client could have a lot of interest in music but could really dislike being a musician. * The person often tries to answer in a social acceptable manner (**social desirability)**

Answer 49

They are reliable and are generally non-threatening to the test taker.

Answer 50

This is one of 202 ACA divisions and is an organization for counselors who are primarily interested in testing.

Answer 51

This is the tendency for people to try to answer a questioning a socially acceptable manner.

Answer 52

This is when someone purposely gives unusual responses. This is the opposite of **social desirability**

Answer 53

When a client always agrees with something.

Answer 54

This is a test like the GRE that attempts to BOTH test your knowledge and predict some kind of performance. Aptitude-achievement tests are like GET, MAT, MCAT, SAT, etc.

Answer 55

How accurate or inaccurate a test score is. If a client decided to take the same test over and over again, you could plot a distribution of the scores. This would be the standard error of measurement for the instrument. The lower the standard error of measurement the better – a low standard of measurement means high reliability.

Answer 56

Social loafing describes a phenomenon in which a person in a group puts forth less effort than if he or she were attempting to accomplish the same goal individually.

Answer 57

Increasing a test's length raises reliability. Decreasing a test's length shortens reliability (so the reliability coefficient would go down)

Answer 58

CPT (Current Procedural Terminology) codes are used to let insurance companies, managed care firms, etc know which service you provided (I.e. individual therapy, family therapy, etc)

Answer 59

These are things like self-reports, case notes, checklists, sociograms of groups, interviews, journals, etc that are considered informal assessments.

Answer 60

Reading the test manual should indicate the target population for the test.

Answer 61

Computer-assisted testing and computer interpretations

Answer 62

Some want to rely on tests more while others want to rely on them less. Some counselors would like to see future tests that assess creative and motivational factors.

Answer 63

A projective test

Answer 64

Infant IQ tests ("toddler tests"() are generally more unreliable than those given later in life – though they are sometimes capable of picking up on gross abnormalities like severe intellectual disabilities.

Answer 65

To never generalize on the basis of a single test score.

Answer 66

This law states that people over 18 can inspect their own records and those of their children. The Family Educational Rights and Privacy Act also stipulates that information cannot be released without adult consent.

Answer 67

Terman was associated with Stanford University and Americanized the Binet. The test later came to be known as the Stanford-Binet.

Assessment and Testing Flashcards

(91 cards)