Research Design, Statistics, Tests, and Measurements Flashcards

Question

F ratio

Answer 1

See physical flashcard.

Answer 2

Each level of a given independent variable occurs with each level of the other independent variables.

Answer 3

Meta-analysis is a statistical procedure that can be used to make conclusions on the basis of data from different studies. If researcher A publishes a study on therapeutic outcomes, and researcher B publishes a similar study using different methods, we can use meta-analysis to combine the results of these studies and come up with a more general conclusion.

Answer 4

Face validity refers to the degree to which a procedure _appear__s_ to measure what it is supposed to measure. If you are interested in measuring knowledge of 20th-century American history, but you give subjects a test on 20th-century European history, the test will lack face validity.

Answer 5

See physical flashcard.

Answer 6

There are two types of ability tests: **aptitude tests** and **achievement tests**. Aptitude tests are used to predict what one can accomplish through training. In other words, they are used to predict _future performance_. Intelligence tests are aptitude tests. Achievement tests, on the other hand, attempt to assess what one knows or can do _now_.

Answer 7

Formally, in taking a _test_, the subjects are instructed to do their best; in completing an _inventory_, they are instructed to represent their typical reactions. A **personality inventory** is a self-rating device usually consisting of somewhere between 100 and 500 statements. The subject is asked to determine if the given statements apply to him or her. Although these structured tools are quite **reliable**, the veracity of responses is not guaranteed. For example, if an item says, "I occasionally steal," most people will tend to answer "no" regardless of whether or not they occasionally steal. The perceived social acceptability of a response is just one factor that can affect the accuracy of inventories that involve self-reporting.

Answer 8

In 1989, a revision of the MMPI, the MMPI-2, added content scales. These scales were formed using items derived from theoretical concerns rather than from an empirical criterion-keying approach. For example, to form the low self-esteem content scale, the authors selected items that ought to be related to low self-esteem. Hence, the original clinical scales have been supplemented with content scales that were developed using a more theoretical approach.

Answer 9

The MMPI is one of the major personality inventories. It consists of 550 statements to which subjects respond "true," "false," or "cannot say." The MMPI yields scores on ten clinical scales, measuring things such as depression, schizophrenia, and masculinity/femininity. It has scales that can indicate whether the person is careless, faking answers, misrepresenting him- or herself, or distorting responses, and whether the distortion is being done intentionally or unintentionally. The purpose of the MMPI is to aid in the assessment of various clinical disorders. All scores on the MMPI are expressed as standard scores with a mean and standard deviation derived from standardization samples.

Answer 10

A **standardization sample** is a population of individuals who have previously well-documented intelligence and/or achievement levels, which is used to "standardize" new or revised test instruments to assure that they are reliably measuring what they are intended to measure.

Answer 11

**Projective tests** are different from **personality inventories** in two basic ways: first, the stimuli in a projective test are relatively ambiguous; and second, the test taker is not limited to a small number of possible responses. A test taker is presented with stimuli and asked to interpret what he or she sees. This means that the scoring of a projective test is subjective, whereas the scoring of personality inventories is objective.

Answer 12

The **Rorschach inkblot test** is a famous **projective test** created by **Hermann Rorschach**. The test is made up of 10 cards that are reproductions of inkblots. The cards are presented to the subject in a specific order with very specific instructions to describe what it is that the blots remind the subject of. The clinician then interprets the results based upon what the person saw and the spontaneous remarks that the person may have made.

Answer 13

The **Blacky pictures** is a projective test devised especially for children. The test consists of 12 cartoon-like pictures that feature a little dog named Blacky. Developed according to psychoanalytic theory, each picture depicts Blacky in a situation designed to correspond to a particular stage of psychosexual development. The test taker is asked to tell stories about the pictures.

Answer 14

The **Rotter Incomplete Sentences Blank** is a **projective test**. The test taker is provided with 40 sentence stems and asked to complete them. The theory is that the test taker will fill in the blanks with whatever is on his or her mind.

Answer 15

See physical flashcard.

Answer 16

The **Barnum effect** is the tendency to accept certain information as true, such as character assessments or horoscopes, even when the information is so vague as to be worthless. The Barnum effect is a form of pseudo validation.

Answer 17

**Interest testing** is usually used to assess an individual's interest in different lines of work. The best-known test of this kind is the **Strong-Campbell Interest Inventory**. This inventory is organized like a personality inventory, and in fact, like the MMPI, was developed using an empirical criterion-keying approach. Test takers are given lists of interests and asked to indicate whether they like or dislike the interest listed. In other sections of the test, the test taker is asked to indicate his or her preference for one of two paired items. The interpretation of the results is based, at least partly, on **John L. Holland's model of occupational themes**. Holland divided interests into six types: realistic, investigative, artistic, social, enterprising, and conventional. That's why it is sometimes called the **RIASEC** system. John L. Holland was an American psychologist who lived from 1919 – 2008.

Answer 18

**David Wechsler** developed 3 major IQ tests: * Wechsler Preschool and Primary Scale of Intelligence (WPPSI) * Wechsler Intelligence Scale for Children (WISC) * Wechsler Adult Intelligence Scale (WAIS) All have been revised and are now called the WPPSI-R, WISC-R, and WAIS-R, and are used with preschoolers, school-aged children (5 – 16 years old), and adults (16 years and older), respectively. The WAIS-IV is the current version utilized for adult intelligence testing.

Answer 19

See physical flashcard.

Answer 20

**Cross-validation**, sometimes called **rotation estimation** or **out-of-sample testing**, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. When assessing the **criterion validity** of a test, cross validation involves repeating the assessment of criterion validitity on a second sample, after you demonstrated validity using an initial sample.

Answer 21

A **significance test** is one tool researchers use to draw conclusions about populations based on research conducted on samples. The idea is to show that the observed results are unlikely to have been observed due to chance, and therefore we should reject the null hypothesis and accept the research, or alternative, hypothesis. The cutoff we use to decide whether to reject the null hypothesis is called the **criterion of significance**. By convention, psychologists usually use 5% as their criterion of significance. The P-value is the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the null hypothesis is true. If our P-value is less than or equal to the criterion of significance (also called the **alpha level**, or simply **alpha**), our results are _statistically significant_, and we reject the null hypothesis.

Answer 22

For an approximately normal data set, the values within one standard deviation of the mean account for about 68% of the set; within two standard deviations account for about 95%; and within three standard deviations account for about 99.7%. Note: for easy divisibility by 2, as a heuristic it may be helpful to use 96% instead of 95%.

Answer 23

The range, standard deviation, and variance are measures of **variability**, or **dispersion**. If the scores in the distribution are all the same, then there is no variability. If the scores are very spread out, then the variability is high. The **range** is the smallest number in the distribution subtracted from the largest number. The **standard deviation** provides a measure of the typical distance of scores from the mean. The **variance** is the square of the standard deviation (the standard deviation is the positive square root of the variance). Both the standard deviation and the variance must be either zero or a positive number.

Answer 24

**Demand characteristics** are cues that inform the subject how he or she is expected to behave. One possible remedy for demand characteristics is the use of _deception_.

Answer 25

* **Naturalistic observation**: Researcher does not intervene; measures behavior as it naturally occurs; also called **field study** * **Correlational**: IV not manipulated * **Quasi-experiment**: IV manipulated; subjects not randomly assigned to groups * **True experiment**: IV manipulated: subjects randomly assigned to groups

Answer 26

Together they published the first intelligence test, known as the **Binet-Simon Intelligence Scale**. The purpose of the test was to assess the intelligence of French schoolchildren to ascertain which children were too intellectually disabled to benefit from ordinary schooling. Binet also introduced the concept of **mental age**, or the level at which a person functions intellectually, regardless of their actual chronological age.

Answer 27

See physical flashcard.

Answer 28

American psychologist who in 1916 revised the Binet-Simon Intelligence Scale for use in the United States. This became known as the **Stanford-Binet Intelligence Scale**. (Terman was a professor at Stanford University. One of his doctoral students was Harry Harlow.)

Answer 29

(1832 – 1920) Founded the first psychology laboratory in 1879. Wundt brought together earlier work in philosophy, physiology, and psychophysics to create psychology as a science. Wundt is often remembered as a structuralist, and for the rather narrow utility of his experimental strivings to reduce consciousness to its elements. Actually, Wundt himself believed that experimental psychology had a very limited use, and could not be used to study the higher mental processes such as memory, thinking, and language. To study the higher mental processes, Wundt proposed a sort of cultural psychology.

Answer 30

Devised **Thematic Apperception Test (TAT),** a projective test consisting of 20 simple pictures which depict scenes with ambiguous meanings. For example, one picture might be a boy staring sadly at a violin. The test taker is told to tell a story about what is happening in the picture, and to provide an ending. Like the **Rorschach test**, there is no standardized scoring method for the TAT. Scoring is qualitative and the clinician has to rely on his or her clinical skills.

Answer 31

Construct validity is the extent to which the measurement or manipulation of a variable accurately represents the theoretical variable being studied. Convergent and discriminant validity are the two subtypes that make up construct validity. **Convergent validity** refers to the degree to which two measures of constructs that theoretically should be related are, in fact, related. **Discriminant validity** refers to whether constructs that are supposed to be unrelated are, in fact, unrelated.

Answer 32

**Validity** is the extent to which a test actually measures what it purports to measure. All types of validity assessment examine the relationship between performance on the test in question and other independent and objective sources of information about the knowledge or behaviors of interest. Types of validity include **criterion validity** (**concurrent** and **predictive**), **construct validity** (**convergent** and **discriminant**), **content validity**, and **face validity**.

Answer 33

See physical flashcard.

Answer 34

The original MMPI was developed by **Starke R. Hathaway** and **J. C. McKinley**, faculty of the University of Minnesota, and first published in 1943. Hathaway and McKinley used the **emperical criterion-keying approach**. They tested thousands of questions and retained those that differentiated between patient and nonpatient populations, even if the item didn't seem to have anything to do with abnormality. The authors examined the responses of patient groups with different diagnoses. Each criterion group's responses formed the basis of a particular clinical scale, so that if a new patient answered questions in the same way that, say, the depressive criterion group did, that patient would receive a high depression score.

Answer 35

A major group of intelligence tests is the **Wechsler scales**. Unlike the **Stanford-Binet**, which were not organized by content, the Wechsler scales have all items of a given type grouped into subtests. These items are arranged in order of increasing difficulty within each subtest. The Wechsler scales have two broad subscales: a **verbal scale** which is based on information, vocabulary, and related skills; and a **performance scale**, which is derived from tests of manipulative skill, eye-hand coordination, and speed.

Answer 36

Reliability is the consistency with which a test measures whatever it is that the test measures. In practice, no test is perfectly reliable. The **standard error of measurement (SEM)** estimates how repeated measures of a person on the same instrument tend to be distributed around his or her "true" score. The true score is always an unknown. The smaller a test's SEM, the more reliable the test is. There are 3 basic types of reliability / methods of assessing reliability: test-retest reliability, alternate-form reliability, and split-half reliability. Each type involves a correlation. In **test-retest reliability**, one test is administered twice to the same group of individuals. In **alternate-form reliability**, two forms of a test are administered to the same group of people. In **split-half reliability**, a single test is divided into equal halves, and scores on one half are correlated with scores on the other half. In all of these methods, a correlation coefficient greater than +0.80 indicates a high level of reliability.

Answer 37

(1860 – 1944) American psychologist who studied under Wilhelm Wundt in Germany and later became the first professor of psychology in the United States. Cattell was a long-time editor and publisher of scientific journals and publications, most notably the journal Science.

Answer 38

(1850 – 1909) A contemporary of Wundt, who studied memory using nonsense syllables, thereby showing that at least one of the higher mental processes (memory) could be studied empirically using good experimental methodology, contrary to Wundt's assertion.

Answer 39

The **California Psychological Inventory (CPI)** is another personality inventory that is based on the MMPI. It was developed to be used with normal populations from age 13 and up. It is especially oriented to high school and college students. The CPI consists of 20 scales, including three validity scales, used to assess test-taker attitudes. Through a series of 462 true-false items, the CPI measures such personality traits as dominance, sociability, self-control, and femininity. Like the MMPI, all scores are expressed as standard scores with a mean and standard deviation derived from standardization samples.

Answer 40

**Content validity**, also called **logical validity**, refers to the extent to which a measure represents all facets of a given construct. For example, a depression scale may lack content validity if it only assesses the affective dimension of depression while failing to take into account the behavioral dimension. An element of subjectivity exists in relation to determining content validity, which requires a degree of agreement about what a particular personality trait such as extroversion represents.

Answer 41

In psychometrics, **criterion validity**, aka **concrete validity**, is the extent to which a measure is related to an outcome. Criterion validity is often divided into concurrent validity and predictive validity. **Concurrent validity** refers to a comparison between the measure in question and an outcome assessed at the same time. **Predictive validity**, on the other hand, compares the measure in question with an outcome assessed at a later time.

Answer 42

Wilhelm Wundt believed that whenever you think of something, an image forms in your mind, i.e. there can be no thought without a mental image. Oswald Külpe disagreed. Külpe strongly believed that there can be imageless thought, and he performed experiments to prove his hypothesis. Külpe lived from 1862 – 1915, and was a protégé of Wundt. Both Wundt and Külpe were part of the structuralist school.

Answer 43

(1896 – 1981) David Wechsler was a Romanian-American psychologist who developed the **Wechsler Intelligence Scales**.

Answer 44

The intelligence quotient (IQ) is a measure of intelligence aptitude using an equation comparing mental age to chronological age. IQ is mental age divided by chronological age, multiplied by 100. An IQ of 100 indicates that a person's mental age is equal to his or her chronological age. This concept is known as the **ratio IQ** and was developed by **William Stern**. One of the problems with the ratio IQ is that after a certain age, chronological age increases while mental age does not. Therefore, even if your mental age remains constant, your IQ will decrease with age. In order to get around this problem, the 1960 revision of the Stanford-Binet used **deviation quotients**. Essentially, a **deviation IQ** **score** tells us how far away a person's score is from the average score for the particular age group the subject is a member of.

Answer 45

An **adaptive test** is a computerized achievement test that adapts to the test taker's ability by assessing the accuracy of previously answered questions. A test taker with a high ability will be faced with more difficult questions than a test taker with a low ability.

Answer 46

**Norm-referenced testing** involves assessing an individual's performance in comparison to others. For example, "Erika did better than 99% of second graders tested." Test norms are derived from standardization samples; the samples should be large and representative of the population to whom the particular test will be administered. One problem with norm-referenced testing is that the population to whom the tests will be administered can, and often does, change. If the population of interest changes, then the original standardization sample would no longer be representative of the population. **Domain-referenced testing**, also called **criterion-referenced testing**, is concerned with the question of what the test taker knows about a specified content domain. Performance is described in terms of what the test taker _knows_ or _can do_, _not_ how you score in relation to your peers. An example of domain-referenced testing is the written test you must take for your driver's license.

Answer 47

* **t-test**: used to compare the means of 2 groups * **ANOVA (analysis of variance)**: used to compare the means of more than 2 groups; also used to determine whether there is any interaction between 2 or more IVs (i.e. the effects of one independent variable are not consistent for all levels of the other independent variables); ANOVAs estimate how much group means differ from each other by comparing the between-group variance to the within-group variance using a ratio, called the **F ratio**. * **Chi-square test**: tests the equality of two frequencies; chi-square tests work with **categorical data**, also called **nominal data**

Answer 48

From time to time, a question pops up on the GRE Psychology Test about what would happen if you converted every score in a distribution to a z-score. Remember that if you have a distribution of z-scores and calculate the mean and standard deviation, the mean of the distribution of z-scores will always be zero and the standard deviation will always be 1. This is true regardless of whether the distribution is normal, and regardless of the mean and the standard deviation of the original distribution.

Answer 49

Any intentional or unintentional influence that the experimenter exerts on subjects to confirm the hypothesis under investigation. Alternatively, the experimenter might also let his or her expectations affect how the results of the experiment are interpreted. One remedy for experimenter bias is **double-blinding**.

Answer 50

* **Simple random sampling**: Every member of the population has an equal probability of being selected for the sample. * **Stratified random sampling**: The population is divided into subgroups (also called **strata**), then random sampling techniques are used to select sample members from each stratum. * **Cluster sampling**: The researcher identifies "clusters" of individuals, then all individuals in each cluster are included in the sample; what makes cluster sampling a form of probability sampling is the way in which the clusters are selected.

Answer 51

In **probability sampling**, each member of the population has a specifiable probability (chance) of being chosen. In **nonprobability sampling**, the probability (chance) of any particular member of the population being chosen is unknown.

Answer 52

(1871 – 1938) German psychologist and philosopher who developed a function to compare mental age with chronological age. Stern coined the term **intelligence quotient**, or **IQ**, to designate the output of this function.

Answer 53

A tentative and testable explanation of the relationship between one or more **independent variables** and one or more **dependent variables**.

Answer 54

The **independent variable** is the variable whose effect is being studied. The **dependent variable** is the variable expected to change due to variations in the independent variable.

Answer 55

A factor that varies in amount or kind and can be measured.

Answer 56

**Operational definitions** state how the researcher will measure the variables.

Answer 57

The **population** is the group to which the researcher wishes to generalize her results. A **sample** is a subset of the population.

Answer 58

A **representative sample** is a sample which matches as many characteristics as possible of the population as a whole.

Answer 59

Characteristics of individuals, such as age, gender, ethnic group, nationality, birth order, personality, or marital status. These variables are by definition nonexperimental; they cannot be manipulated, they can only be measured.

Answer 60

In a **between-subjects design**, each subject is exposed to only one level of each independent variable. A **matched-subjects design** is like a between-subjects design, except that every subject in one group is "matched" with an "equivalent" subject in another group. The idea is to negate the effect of confounding variables. In a **within-subjects design** (also called **repeated-measures design**), the subject's own performance is the basis of comparison. Each subject experiences multiple levels of the IV.

Answer 61

**Counterbalancing** is an attempt to counteract order effects in **within-subjects design** (also called **repeated-measures design**). Half the subjects might be given IV_A on day one and IV_B on day two, while the other half would be given IV_B on day one and IV_A on day two.

Answer 62

Unintended independent variables.

Answer 63

**Control group design** means treating every group identically in all respects except for carefully varying the levels of one or more independent variables.

Research Design, Statistics, Tests, and Measurements Flashcards

(87 cards)