Research Design, Statistics, Tests, and Measurements Flashcards
Hermann Ebbinghaus
Showed that higher mental processes could be studied empirically using experimental methodology By studying memory using nonsense syllables.
Wilhelm Wundt
Founded the first psychology laboratory in 1879 Believed that methodology couldn’t be used to study higher mental processes like memory, thinking, language Also believed that there could be no thought without mental image
Oswald Kulpe
Found that there could be imageless thought
James McKeen Cattell
Studied under Wundt; introduced mental testing to the U.S.
Binet & Simon
Collaborated to publish the Stanford-Binet test, first intelligence test. Also introduced the the concept of mental age (based on intellect)
William Stern
Developed IQ: equation to compare mental age to actual age as a measure of intelligence/aptitude
Terman
Revised the Binet-Simon test for use in the U.S. Became known as the Stanford-Binet Intelligence Test
Operational Definition
States how the researcher defines the variables so that they are measurable
True experiments, quasi-experiments, and correlational studies
True: random assignment, manipulate IV Quasi: no random assignment, no sufficient control on variables Correlational: do not manipulate the IV
Naturalistic observation
Researcher doesn’t intervene; measure natural behavior
Representative sample
Sample is a miniature version of the population
Random sample
Every population member has an equal chance of being selected
Stratified random sample
Relevant subgroups of the population are randomly sampled in proportion to size
Three common research designs
Between-subjects design, matched-subjects design, within-subjects design
Between-subjects design
Each subject exposed to one level of each IV P’s randomly assigned to groups
Matched-subjects design
Split subjects into groups while controlling for a given variable. Ex. Take the two students with the two top IQs and randomly assign each into the two groups. Then the next two highest IQs, and so on.
Within-subjects design
Also called repeated-measures; each subject is exposed to all conditions, removing individual difference as a confound
Control group design
A control and experimental group; one receives treatment and one does not
Nonequivalent group design
Doesn’t use random assignment; ex. Using one class for one teaching method and another class for another teaching method
Demand characteristics
Any cues given to subjects suggesting what the researcher expects of them; May influence he subjects’ behavior and skew results Ex. Placebo effect
Hawthorne effect
Tendency to behave differently when they know they’re being observed. Using a control group that is also observed can control for the Hawthorne effect.
External validity
Stronger external validity –> more generalizable to the general population
Descriptive v. Inferential statistics
Descriptive: organize, describe, quantify, and summarize data Inferential: generalize; estimate population characteristics
Frequency distribution
Characteristic is graphed along the X axis and frequency along the line
Measures of central tendency
Mode, median, mean
Measures of variability
Range, standard deviation (average distance from mean), variance (Standard deviation squared)
Normal Distribution

z-score
Tells how many standard deviations a particular score is from the mean.
Subtract the mean of the distribution from your score and divide by standard dev
Negative z-score –> falls below mean
Positive z-score –> fall above mean
Converting every score to z-scores
Mean is always 0, standard deviation is always 1
T-scores
T-score distribution has a mean of 50 and standard deviation of 10.
Commonly used in test score interpretation
Skewed Distributions
Scores are not centered around mean/median.
Mean/median are not the same
Skewness is interpreted by tail
Correlation Coefficient
Descriptive statistic used to measure how related two variables are
Direction and degree
Can’t show cause/effect
Visualized on scatter plot, draw line of best-fit, coefficient is the slope of the line.
Factor analysis
Attempts to account for the interrelationships amoung variables by seeing how they hang together
Ex. Variables A, B, C are highly correlated to one another but not to D, E, F AND D, E, F are highly correlated but not to A, B, C
Null hypothesis
Sample mean is the same as the population mean
If the observed difference is STATISTICALLY SIGNIFICANT…
Reject the null
Alpha level
Criterion of significance selected by the researchers
Errors in Testing Significance

Significance Testing Process
- Formulate null and alt hyp. based on research hyp.
- Decide on an alpha level (0.05)
- Collect data
- Perform significance test on data to obtain sig. level. Significance level tells us how likely our difference is due to chance. Large level –> more likey due to chance. Smaller level –> more likely caused by hypothesized effect.
- Compare obtained level of sig. to alpha. If sig. is less than alpha –> statistically sig. If level is greater than alpha –> stat. insignificant.
- Significant –> reject null. Insignificant –> accept null.
Types of Significance Tests
t-test, ANOVA, and chi-square
t-test
used to compare the means of two groups
ANOVA
Used to estimate how much group means differ from one another by comparing the between-group variance to the within-group variance using the F ratio.
F ratio= Between-Group Variance Estimate OVER
Within-Group Variance Estimate
Factorial design: each level of IV occurs with each level of the other IVs
Interaction: when effects of one IV are not consistent with all levels of he other IVs
Chi-square tests
For categorical rather than numerical data
When summarizing cat. data –> frequencies or proportions
Meta-analysis
Statistical procedure used to make conclusions on the basis of data from different studies
Norm-referenced Testing v. Domain-referenced Testing
Norm-referenced: assesses an ind.’s performance in terms of how they perform in comparison to others
Problem: is specific on the population tested, which can (and often does) change
Domain-Referenced: (criterion-referenced testing) concerned with what the ind. knows about a specific content domain; does not compare to peers, concerned only with ind. mastery
Reliability
Consistency with which a test measures whatever it is that it is intended to measure
High reliability –> test produces dependable, reproducible, and consistent measures
Standard Error of Measurement (SEM)
Index of how much, on average, we expect a person’s score to vary from the score the person is really capable of (by ability)
Desirable SEM = 0 but not possible in reality
Speaks to the test’s reliaiblity
Three methods used to establish the reliability of a test
test-retest, alternate-form, split-half reliability
test-retest method
consistent when administered to same people more than once
alternate-form method
consistency when examinees are given two different forms of the test at two different times
split-half reliability
test-takers have only one test, but it is divided into two equal halves
Strong correlation coefficient between pairs? –> high reliability
Validity and the 6 subtypes
Extent to which a test actually measures what it is supposed to measure (concerned with accuracy rather than consistency)
Content, face, criterion concurrent, predictive, construct convergent, discriminate
Content Validity
Test’s coverage of the particular think it seeks to measure
ex. If a test is supposed to measure knowledge about a particular topic, test items should include questions about that topic.
Face validity
Does the test APPEAR to measure what it is supposed to measure
Criterion Validity
How well does the test predict an individual’s performance on an established test designed to measure the same thing
Cross validation: involves testing the criterion validity on a second sample
Construct Validity
Refers to how well performance on a test fits into the theoretical framework related to what you want to measure
Ex. If a relationship between sociability and intelligence is well supported by research, then your test measuring sociability should also show high scores among people with high intelligence.
Convergent Validity
Shows construct validity
Discriminant validity
performance on the test is not correlated with other variables that it should not be related to
Ex. test-taking experience
A test can show perfect reliability and very little…
Validity. But the oposite is not possible.
Four Types of Measurement Scales
NOMINAL (categorical)
ORDINAL (rank)
INTERVAL (actual number, like number correct on a test)
RATIO (includes a true zero point that indicates absence of quality measured, like income)
Ability Tests
Aptitude: predict what an ind. can accomplish through training; predict future performance (like IQ)
Achievement: assess what one knows/can do *now *
Wechsler tests
Two broad subscales: verbal and performance
Verbal: based on information, vocabulary
Performance: Tests of manipulative skill, eye-hand coordination, and speed
Three major tests of intelligence: Wechsler Preschool and Primary Scale of Intelligence (WPPSI; preschool), Wechsler Intelligence Scale for Children (WISC; age 5-16), and Wechsler Adult Intelligence Scale (WAIS; 16+)
Have all been revised: WPPSI-R, WISC-R, and WAIS-R
WAIS-III is currently used for adult IQ testing
Personality Inventory
Self-rating device (consisting of 100-500 statements)
Minnesota Multiphasic Personality Inventory, MMPI
550 statements to which subjects resond T, F, or “cannot say”
Yields ten clinical scales
Used to aid in the assessment of various clinical disorders
Validity Summary Image

empirical criterion-keying approach
used by Hathaway and McKinley to develop the MMPI
Tested thousands of questions and retained those that differentiated between patient and nonpatient populations
MMPI-2
Added content scales; formed using derived theoretical concerns
To form each content scale, test authors selected items that “ought” to be related to self-esteem (by theory)
California Psychological Inventory (CPI)
personality inventory, based on MMPI
Developed to be used in normal populations, age 13 and up (esp. high school and college students
Consists of 20 scales, including 3 validity scales
~460 T/F items, measures personality traits (like sociability)
Projective Tests
Stimuli are ambiguous, and responses are not limited
Test taker is asked to interpret ambiguous stimuli
Scoring is subjective
Rorschach
Inkblot test
10 cards presented to subject in specific order with specific instructions; clinician then interprets results based on what they saw and the spontaneous remarks the person made
Thematic Apperception Test (TAT)
Morgan and Murray
20 pictures depicting scenes with ambiguous meanings
Subject is asked to tell what is happening, describe what led to the event, and give an ending
No standardized scoring
Blacky pictures
projective test devised for children
12 cartoonlike pictures that feature a small dog (Blacky)
Developed depicting Blacky in situations designed to correspond to particular stage of psychosexual developement
Subject is asked to tell stories about the pictures shown
Rotter Incomplete Sentences Blank
Sentence completion task; projective technique used by clinicians and researchers
Barnum effect
Tendency people have to accept and approve of the interpretation of personality you give to them
Form of pseudovalidation
Interest Testing
Assesses an ind’s interest in different lines of work
Strong-Campbell Interest Inventory
Strong-Campbell Interest Inventory
Developed using an empirical criterion-keying approach
Test takers are given list of interests; asked to indicate whether they like/dislike the interest listed
A second section asks test-takers to indicate preference for one of two paired items
Divides interests into 6 types: realistic, investigative, artistic, social, enterprising, and conventional
(also called RIASEC)