Exam #2 Flashcards
ch.5 Self-report measure +
A method of measuring a variable in which people answer questions about themselves in a questionnaire or interview. For example, asking people how much they appreciate their partner and asking about gender identity are both self-report measures.
Observational measures+
A method of measuring a variable by recording observable behaviors or physical traces of behaviors. Also called behavioral measure. For example, a researcher could operationalize happiness by observing how many times a person smiles. Intelligence tests can be considered observational measures, because the people who administer such tests in person are observing people’s intelligent behaviors. People may change behavior because they know they are being watched.
Physiological measures+
A method of measuring a variable by recording biological data, such as heart rate, galvanic skin response, blood pressure. Physiological measures usually require the use of equipment to record and analyze biological data.
Ex: measuring hormone levels, brain activity;
Measure cortisol in saliva of children and check how it’s related to their behavior.
nominal scale (categorical variable)?+
Nominal and ordinal scales are subtypes of categorical variable.
A variable whose levels are categories (e.g., male and female). Categories with no numeric scales (ex: right vs. left handedness). Also called nominal variable.
Examples are sex, whose levels are male and female, and species, whose levels in a study might be rhesus
macaque, chimpanzee, and bonobo. A researcher might decide to assign numbers to the levels of a categorical variable (e.g., using “1” to represent rhesus macaques, “2” for chimps, and “3” for bonobos).
Quantitative variables definition? What are they+
A variable whose values can be recorded as meaningful
numbers. They are ratio, interval, and ordinal.
e.g. Height and weight are quantitative because they are measured in numbers, such as 170 centimeters or 65 kilograms. IQ score, level of brain activity, and amount of salivary cortisol are also quantitative variables.
Ordinal scale+
A quantitative measurement scale whose levels represent a ranked order, and in which distances between levels are not equal.
- Rank ordering (ex: letter grades)
- For example, a bookstore’s website might display the top 10 best-selling books.
- The intervals may be unequal. E.g. Maybe the first two rankings are only 10 books apart, and the second two rankings are 150,000 books apart.
Interval scale+
A quantitative measurement scale that has no “true zero,” and in which the numerals represent equal intervals (distances) between levels (e.g., temperature in degrees).
- A person can get a score of 0, but the 0 does not literally mean “nothing”
e.g. Body temperature in degrees Celsius is an example of an interval scale—the intervals between levels are equal; however, a temperature of 0 degrees does not mean a person has “no temperature.” The same is about IQ. A score of 0 on an IQ test does not mean a person has “no intelligence.”
Ratio scale+
A quantitative measurement scale in which the numerals have equal intervals and the value of zero truly means “none” of the variable being measured. On this scale, 0 truly represents “nothing correct” (0 answers correct).
e.g. The number of eyeblinks is a ratio scale because 0 would represent zero eyeblinks. Because ratio scales do have a meaningful zero, a researcher can say something like “Alek answered twice as many problems as Hugo.”
What’s reliability? What are the kinds of reliability?+
The consistency or stability of the results of a behavior measure. They are Test-retest reliability, Alternate forms reliability, Interrater reliability, Internal reliability,
Split-half reliability.
Correlation coefficient+
A single number, ranging from –1.0 to 1.0, that indicates the strength and direction of an association between two variables. It tells how strongly two variables are related to each other.
Ranges from 0.00 to +1.00 and 0.00 to -1.00
The closer to +1.00 or -1.00, the stronger the correlation
The numbers below the scatterplots are the correlation coefficients, or r. The r indicates the same two things as the scatterplot: the direction of the relationship and the strength of the relationship.
Slope direction?+
The upward, downward, or neutral slope of the cluster of data points in a scatterplot.
- slope direction can be positive, negative, or zero—that is, sloping up, sloping down, or not sloping at all.
test-retest reliability?+
The consistency in results every time a measure is used.
- Test-retest reliability is assessed by measuring the same individuals at two points in time and comparing results. High correlation between test and retest indicates reliability.
For example, a trait like intelligence is not usually expected to change over a few months, so if we assess the test-retest reliability of an IQ test and obtain a low , we would be doubtful about the reliability of this test. In contrast, if we were measuring flu symptoms or seasonal stress, we would expect test-retest reliabilities to be low, simply because these constructs do not
stay the same over time.
interrater reliability?+
The degree to which two or more observers give consistent ratings of a set of targets.
- Interrater reliability is the correlation between the observations of different RATERS.
-A high correlation indicates raters agree in their ratings.
- To test the interrater reliability of some measure, we might ask two observers to rate the same participants at the same time, and then we would compute r. If r is positive and strong (according to many researchers, = .70 or higher), we would have very good interrater reliability.
For example, suppose you are assigned to observe the number of times each child smiles in 1 hour at a childcare playground. If, for one child, you record 12 smiles during the first hour and your lab partner also records 12 smiles in that hour, there is interrater reliability.
Alternate forms of reliability+
the consistency of test results between two different forms of the same test. It uses 2 forms of the same test for testing instead of repeating the same test.
- This avoids problems with participants remembering and repeating earlier responses.
- Repeating tests with the same people can be impractical
Internal reliability (consistency)+
Also called internal consistency.
- In a measure that contains several items, the consistency in a pattern of answers, no matter how a question is phrased.
Split half reliability (not in the book ask) +
Correlation of the total score on one half of the test with the total score on the other half.
- High correlation indicates that the questions on the test are measuring the same thing.
(Split-half testing is a measure of internal consistency — how well the test components contribute to the construct that’s being measured).
1. Split a test into two halves. For example, one half may be composed of even-numbered questions while the other half is composed of odd-numbered questions.
2. Administer each half to the same individual.
3. Repeat for a large group of individuals.
4. Find the correlation between the scores for both halves.
Cronbach’s alpha (or coefficient alpha) +
Average of all possible split-half reliability coefficients.
A correlation-based statistic that measures a scale’s internal reliability.
- The closer Cronbach’s alpha is to 1.0, the better the scale’s reliability. For self-report measures, researchers are looking for Cronbach’s alpha of .80 or higher.
What’s validity?+
how accurate an assessment/test/measure is
-The appropriateness of a conclusion or decision
Construct validity?+
An indication of how well a variable was measured or manipulated in a study.
- It can be used in observational research.
For example, how much do people eat in fast-food restaurants?
Construct validity is especially important when a construct is not directly observable. Take happiness: We have no means of directly measuring how happy a person is. We could estimate it in a number of ways, such as scores on a well-being inventory, daily smile rate, blood pressure, stress hormone levels etc.
Face validity+
The extent to which a measure is subjectively considered a plausible operationalization of the conceptual variable in question.
-!!The content of the measure appears to reflect the construct being measured.
Ex: Head circumference has high face validity as a measurement of hat size, but it has low face validity as an operationalization of intelligence. In contrast, speed of problem solving, vocabulary size, and curiosity have higher face validity as operationalizations of intelligence.
-Does the measure look good: the weakest validity.
Content validity+
The extent to which a measure captures all parts of a defined construct.
Ex: measure all anxiety domains. Consider this conceptual definition of intelligence, which contains distinct elements, including the ability to “reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly, and learn from experience”. To have adequate content validity, any operationalization of intelligence should include questions or items to assess each of these seven components.
Criterion validity+
evaluates whether the measure under consideration is associated with a concrete behavioral outcome with which it should be associated.
It’s measuring current outcome.
We make sure that no other events can influence the outcome.
Predictive validity (ask not in book)+
refers to the ability of a test or other measurement to predict a future outcome. Here, an outcome can be a behavior, performance, or even disease that occurs at some point in the future.
e.g. A pre-employment test has predictive validity when it can accurately identify the applicants who will perform well after a given amount of time, such as one year on the job.
Convergent validity+
1) refers to how closely a test is related to other tests that measure the same (or similar) constructs.
e.g. Suppose you use two different methods to collect data about anger: observation and a self-report questionnaire. If the scores of the two methods are similar, this suggests that they indeed measure the same construct. A high correlation between the 2 test scores suggests convergent validity.
2) - Scores on the measure are related to other measures of the same or similar construct. Ex: scores match from different measures of anxiety; A measure of depression should correlate with a different measure of the same construct—depression.
Discriminant validity+
Scores on the measure are not related to other measures that are theoretically different. Discriminant validity specifically measures whether constructs that theoretically should not be related to each other are, in fact, unrelated.
e.g. depression is not the same as a person’s perception of their overall physical health
For example, the scores of two tests measuring security and loneliness theoretically should not be correlated. In other words, individuals scoring high in security are not expected to score high in loneliness. If that proves true, these 2 tests would have high discriminant validity.
Surveys definition? (ch. 6)
A method of posing questions to people on the telephone, in personal interviews, on written questionnaires, or via the Internet. Also called polls.
But, it is often used when people are asked about consumer products.
Polls definition
A method of posing questions to people on the telephone, in personal interviews, on written questionnaires, or via the Internet. Also called survey.
But, it is often used when people are asked about their social or political opinions.
Observational research
The process of watching people or animals and systematically recording how they behave or what they are doing.
Some claims based on observational data.
– Observing how much people talk, behave, etc.
Strength - for people that struggle with introspection.
Observer bias?
a bias that occurs, when observers’ expectations influence their interpretation of participant behavior or outcomes of the study. Instead of rating behaviors objectively, observers rate behaviors according to their own expectations or hypotheses.
Observer/expectancy effects
observers inadvertently change the behavior of the participants they are observing.
- Observers not only see what they expect to see; sometimes they even cause the behavior of those they are observing, such as rats to conform to their expectations.
Ethics for observational research
-Public settings usually ok.
Most psychologists believe it is ethical to watch people in museums and classrooms, at sports events, or even at the sinks of public bathrooms.
-Private settings require more attention and policies.
In most cases, psychologists doing research must obtain permission in advance to watch or to record people’s private behavior. If hidden video recording is used, the researcher must explain the procedure at the conclusion of the study.
Ways to reduce observer bias & effects
- Researchers can assess the construct validity of a coded measure by using multiple observers.
- Masked design/blind design- observers are unaware of the purpose of the study and the conditions/groups participants assigned to.
- Training for observers
If there is disagreement, the researchers may need to train their observers better and develop a clearer coding system for rating the behaviors. - “Blend in”. One way to avoid observer effects is to make unobtrusive observations—that is, make yourself less noticeable.
- “Wait it out”. A researcher who plans to observe at a school might let the children get used to his or her presence until they forget they’re being watched.
- “Indirect measure”. Instead of observing
behavior directly, researchers measure the traces a particular behavior leaves behind. e.g. The number of empty liquor bottles in residential garbage indicates how much alcohol is being consumed in a community.
(- Researchers develop clear rating instructions, often
called codebooks, so the observers can make reliable judgments with minimal bias.)
Constructing Leading Questions to Ask (simplicity)
The way a question is worded and presented in a survey can make a tremendous difference in how people answer.
Constructing Double-barreled Questions to Ask
A type of question in a survey or poll that is problematic because it asks two questions in one, thereby weakening its construct validity. People might be responding to the first half of the question, the second half, or both.
e.g. Do you enjoy swimming and running?
Constructing Negatively-worded Questions to Ask
A question in a survey or poll that contains negatively phrased statements, making its wording complicated or confusing and potentially weakening its construct validity.
Ex: People who do not drive with an expired license should never be punished. “It’s impossible that it never happened.” In order to give your opinion, you must be able to unpack the double negative of “and“ So instead of measuring people’s beliefs, the question may be measuring people’s working memory.
Constructing Acquiescence Questions to Ask
One potential response set is acquiescence, or “yea saying”. This occurs when people say “yes” or “strongly agree” to every item instead of thinking carefully about each one. For example, a respondent might answer “5” to every item on Diener’s scale of subjective well-being—not because the respondent is happy, but because that person is using a yea-saying shortcut. It can threaten construct validity because instead of measuring the construct of true feelings of well-being, the survey could be measuring the lack of motivation to think carefully.
Open-ended vs forced-choice (closed-ended) questions
Open-ended - A survey question format that allows respondents to answer any way they like. They might ask people to name the public figure they admire the most or comment on their experience at a hotel.
Ex: What do you think of this food? (Lots of answers)
Closed-ended - A survey question format in which respondents give their opinion by picking the best of two or more options.
Ex: Do you like this food? ( Yes no answer)
Would you vote for the Republican Or the Democrat?
Forced-choice questions are also used to measure personality.
Rating scales: semantic differential format?
A survey question format using a response scale whose numbers are anchored with adjectives.
e.g. on the Internet site RateMyProfessors.com, students assign ratings to a professor using the following adjective phrases.
“Profs get F’s too 1 2 3 4 5 A real gem”
Internet rating sites (like Yelp) use is another example: one star means “poor” or (on Yelp) “Eek! Methinks not,” and five stars means “outstanding” or even “Woohoo! As good as it gets!”