Reading Key Points Flashcards
Two key features of an Experiment?
• Manipulation: Researchers manipulate, or systematically vary, the level of the independent variable. The different levels of the independent variable are called conditions.
• Control: The researcher controls, or minimises the variability in, variables other than the independent and dependent variable. These other variables are called extraneous variables.
*They manipulate the independent variable by systematically changing its levels and control other variables by holding them constant.
Manipulation of the Indpendent Variable:
To manipulate an independent variable means to change its level systematically so that different groups of participants are exposed to different levels of that variable (between-subjects), or the same group of participants is exposed to different levels at different times (within-subjects).
The different “levels” of an independent varaible are called conditions.
Involves the active intervention of the researcher (i.e., they produce the difference between the two groups; it’s not an existing subject factor they differ in).
Manipulation of the IV eliminates third-variable problems because researchers put in effort to ensure that they only difference between the two groups is the different levels of the IV the experimenter exposes them to.
It is sometimes unethical to manipulate the IV and an experiment can not be conducted (medical studies or inducing – emotion/trait/behaviours that cause harm or distress to participant).
The IV is a construct which is indirectly measured through our operational variables.
A manipulation check is a seperate measure of the construct used to verify that the researchers have successfully manipulated the variable (self reported stress & blood pressure).
Control of Extrenuous Variables:
An extraneous variable is anything that varies in the context of a study other than the independent and dependent variables.
Individual differences or Situational and task variables.
They pose a problem because they are likely to exert an effect on the dependent variable.
This makes it harder to seperate the effects of the IV and externeous variables (i.e., confounds). Therefore, researchers control these extaneous variables by keeping them constant.
Extraneous Variables as “Noise”
Extraneous variables make it hard to detect the effects of the IV in two ways:
• Adding variability or “noise” to the data makes it harder to detect the effects of the IV von the DV.
• One way to control for extraneous variables is to keep them constant.
Through keeping situation and task variables equal across condtions, using a standardised format, providing same tools, interacts to participants or putting inclusion/exclusion criteria on participants.
Putting restrictions on participant inclusion limites the external validity of the study (i.e., how generalisable it is to the population).
In many studies the importance of having a representative sample outwieghs the benifits of minimising noise.
Extraneous Variables as Confounds
The second way extraneous variables make it hard to detect the effects of the IV is that they act as confounds.
Confund variables are extraneous variables which differs on average across the levels of the independent variable[s] (i.e., intellegence if there is not an equal mix of high and low IQ participants in each condition).
To confound means to confuse because they provide alternative explanations for the effect found on the DV that can not always be ruled out or disproven.
One way to avoid confounds is to keep extaneous variables held constant.
Another way is to randomly assign participants to conditions.
Summary:
Chapter 6
experimental design
An experiment is a type of empirical study that features the manipulation of an independent variable, the measurement of a dependent variable, and control of extraneous variables.
An extraneous variable is anything that varies in the context of a study other than the independent and dependent variables. Extraneous variables make it difficult to detect the effect of the independent variable because they add variability or “noise” to the data
A confounding variable is an extraneous variable that differs on average across levels of the independent variable. Because they differ across levels of the independent variable, confounding variables provide an alternative explanation for an effect on the dependent variable.
Experiments can be conducted using either between-subjects or within-subjects designs. Deciding which to use in a particular situation requires careful consideration of the pros and cons of each approach.
Random assignment to conditions in between-subjects experiments or to orders of conditions in within-subjects experiments is a fundamental element of experimental research. Its purpose is to control extraneous variables so that they do not become confounding variables.
Experimental research on the effectiveness of a treatment requires both a treatment condition and a control condition, which can be a no-treatment control condition, a placebo control condition, or a wait-list control condition. Experimental treatments can also be compared with the best available alternative.
Studies are high in internal validity to the extent that the way they are conducted supports the conclusion that the independent variable caused any observed differences in the dependent variable. Experiments are generally high in internal validity because of the manipulation of the independent variable and control of extraneous variables.
Studies are high in external validity to the extent that the result can be generalised to people and situations beyond those actually studied. Although experiments can seem “artificial”—and low in external validity—it is important to consider whether the psychological processes under study are likely to operate in other people and situations.
There are several effective methods you can use to recruit research participants for your experiment, including through formal participant pools, advertisements, and personal appeals. Field experiments require well-defined participant selection procedures.
It is important to standardise experimental procedures to minimise extraneous variables, including experimenter expectancy effects.
It is important to conduct one or more small-scale pilot tests of an experiment to be sure that the procedure works as planned.
summary
chapter 1
what is science
There is a history of biased research that was labelled science being used as a tool to justify European colonisation and other injustices. It is important that psychological science is conducted ethically and free from bias.
Science is a general way of understanding the natural world. Its three fundamental features are systematic empiricism, empirical questions, and public knowledge.
Scientific psychology takes the scientific approach to understanding human behaviour.
Pseudoscience refers to beliefs and activities that are claimed to be scientific but lack one or more of the three features of science. It is important to distinguish the scientific approach to understanding human behaviour from the many pseudoscientific approaches.
Research in psychology can be described by a simple cyclical model. A research question based on the research literature leads to an empirical study, the results of which are published and become part of the research literature.
Scientific research in psychology is conducted mainly by people with doctoral degrees in psychology and related fields, most of whom are university academic staff. They do so for professional and for personal reasons, as well as to contribute to scientific knowledge about human behaviour.
Basic research is conducted to learn about human behaviour for its own sake, and applied research is conducted to solve some practical problem. Both are valuable, and the distinction between the two is not always clear-cut.
People’s intuitions about human behaviour, also known as folk psychology, often turn out to be wrong. This is one primary reason that psychology relies on science rather than common sense.
Researchers in psychology cultivate certain critical-thinking attitudes. One is scepticism. They search for evidence and consider alternatives before accepting a claim about human behaviour as true. Another is tolerance for uncertainty. They withhold judgement about whether a claim is true or not when there is insufficient evidence to decide.
The scientific approach to psychology has tended to view the individual as an isolated unit and critics have argued that has resulted in psychology research overlooking social issues. Social constructionism and mātauranga Māori embed social connections as a fundamental part of our psychology.
The clinical practice of psychology—the assessment and treatment of psychological problems—is one important application of the scientific discipline of psychology.
Scientific research is relevant to clinical practice because it provides detailed and accurate knowledge about psychological.
summary
chapter 5
what is measurement
Measurement is the assignment of scores to individuals so that the scores represent some characteristic of the individuals. Psychological measurement can be achieved in a wide variety of ways, including self-report, behavioural, and physiological measures.
Psychological constructs such as intelligence, self-esteem, and depression are variables that are not directly observable because they represent behavioural tendencies or complex patterns of behaviour and internal processes. An important goal of scientific research is to conceptually define psychological constructs in ways that accurately describe them.
For any conceptual definition of a construct, there will be many different operational definitions or ways of measuring it. The use of multiple operational definitions, or converging operations, is a common strategy in psychological research. Variables can be measured at four different levels—nominal, ordinal, interval, and ratio—that communicate increasing amounts of quantitative information. The level of measurement affects the kinds of statistics you can use and conclusions you can draw from your data.
Psychological researchers do not simply assume that their measures work. Instead, they conduct research to show that they work. If they cannot show that they work, they stop using them.
There are two distinct criteria by which researchers evaluate their measures: reliability and validity. Reliability is consistency across time (test-retest reliability), across items (internal consistency), and across researchers (interrater reliability). Validity is the extent to which the scores actually represent the variable they are intended to.
Validity is a judgement based on various types of evidence. The relevant evidence includes the measure’s reliability, whether it covers the construct of interest, and whether the scores it produces are correlated with other variables they are expected to be correlated with and not correlated with variables that are conceptually distinct.
The reliability and validity of a measure is not established by any single study but by the pattern of results across multiple studies. The assessment of reliability and validity is an ongoing process.
Good measurement begins with a clear conceptual definition of the construct to be measured. This is accomplished both by clear and detailed thinking and by a review of the research literature.
You often have the option of using an existing measure or creating a new measure. You should make this decision based on the availability of existing measures and their adequacy for your purposes.
Several simple steps can be taken in creating new measures and in implementing both existing and new measures that can help maximize reliability and validity.
Once you have used a measure, you should re-evaluate its reliability and validity based on your new data. Remember that the assessment of reliability and validity is an ongoing process.
Pseudoscience:
o Refers to work that claims to be and appears to be science at a first glance but doesn’t adopt the rigorous scientific methodology and bases its claim on anecdotal evidence.
o E.g., trepidation or biorhythms theory:
o The idea is that people’s physical, intellectual, and emotional abilities run in cycles that begin when they are born and continue until they die. Allegedly, the physical cycle has a period of 23 days, the intellectual cycle a period of 33 days, and the emotional cycle a period of 28 days
Criteria for Pseudoscience:
o A set of beliefs or activities can be said to be pseudoscientific if (a) its adherents claim or imply that it is scientific but (b) it lacks one or more of the three features of science.
o i.e., empirically tested, falsifiable hypothesis (observations can provide evidence to support or disprove claim), public knowledge.
Why care about pseudoscience?
o Helps highlight the core components of sicence and why they’re important.
o The acceptance of false beliefs can have determental effects on society and learning about pseudoscience can help us evaluate theory and spot them.
o Psuedopsychology exists and its important for students of psychology to know this.
Examples:
o Cryptozoology:
The study of “hidden” creatures like Bigfoot, the Loch Ness monster, and the chupacabra.
o Pseudoscientific psychotherapies”
Past-life regression, rebirthing therapy, and bioscream therapy, among others.
o Homeopathy:
The treatment of medical conditions using natural substances that have been diluted sometimes to the point of no longer being present.
o Pyramidology:
Odd theories about the origin and function of the Egyptian pyramids (e.g., that they were built by extraterrestrials) and the idea that pyramids in general have healing and other special powers.
Can We Only Rely on Common Sense?
Folk psychology: intuitive beliefs about peoples thoughts, feelings and behaviour.
Intuition can be inaccurate and scientific research can often disprove such claims:
E.g., flase confessions are common and yelling and screaming to vent out anger only makes you more angry.
E.g., myths.
How Could We Be So Wrong?
How can our intuitive beliefs be so wrong?
Forming detailed and accurate beliefs requires powers of observation, memory, and analysis to an extent that we do not naturally possess.
Thus, we tend to rely on mental shortcuts or heuristics (i.e., conformation bias, believing something is likley to be true because it is endorsed by others or experts; or believing a myth because it would benifit us if its true).
Psychologists are just as prone to false beliefs so we work on training a level of sckeptism when we digest new information (i.e, consider its reliability, validity and alternative explanations).
Scientists also cultivate a tolerance for uncertainty. They accept that there are many things that they simply do not know.
An operational definition is a definition of a variable in terms of precisely how it is to be measured.
These measures generally fall into one of three broad categories:
o Self-report measures are those in which participants report on their own thoughts, feelings, and actions, as with the Rosenberg Self-Esteem Scale.
o Behavioural measures are those in which some other aspect of participants’ behaviour is observed and recorded. This is an extremely broad category that includes the observation of people’s behaviour both in highly structured laboratory tasks and in more natural settings.
o Physiological measures are those that involve recording any of a wide variety of physiological processes, including heart rate and blood pressure, galvanic skin response, hormone levels, and electrical activity and blood flow in the brain.
o For any given variable or construct, there will be multiple operational definitions:
When psychologists use multiple operational definitions of the same construct—either within a study or across studies—they are using converging operations.
The idea is that the various operational definitions are “converging” or coming together on the same construct.
When scores based on several different operational definitions are closely related to each other and produce similar patterns of results, this constitutes good evidence that the construct is being measured effectively and that it is useful.
This is what allows researchers eventually to draw useful general conclusions, such as “stress is negatively correlated with immune system functioning”, as opposed to more specific and less useful ones, such as “people’s scores on the Perceived Stress Scale are negatively correlated with their white blood counts”.
Psychological Constructs:
Some variables are straightforward to measure like demographic information such as sex, age, height, weight or birth order.
Most variables are not straightforward or simple to measure. These variables are called “constructs” and include personality traits, emotional states, attitudes and abilities.
Psychological constructs cannot be observed directly. One reason is that they often represent tendencies to think, feel, or act in certain ways (i.e., that varies across settings) or that they are internal states that are not directly observable.
The conceptual definition of a psychological construct describes the behaviours and internal processes that make up the big construct and its related variables.
o E.g., neuroticism can be conceptually defined as “people’s tendency to experience negative emotions such as anxiety, anger, and sadness across a variety of situations. It has a strong genetic component, is a stable trait, and positively correlated with the tendency to experience pain or other physical symptoms”.
o Psychologists write definitions which are more detailed and precise than the dictionary and allow us to test them empirically and refine them if needed.
Levels of Measurement: o Stevens (1947) suggested four different levels of measurement (which he called “scales of measurement”) that correspond to four different levels of quantitative information that can be communicated by a set of scores, nominal, ordinal, interval, and ratio levels.
Nominal:
• Categorical data which indicates whether participants are a member of a certain category (i.e., male/female, old/young, high/low SE, ethnicity, favourite colour etc.).
• Lowest level of measurement that merely categories responses and doesn’t imply a rank or order to the responses.
Ordinal:
• Assigning scores so that they represent the rank order of the individuals.
• Ranks provide information about whether individuals are in the same category or not, and who’s higher or lower on the variable (i.e., consumer satisfaction).
• Missing information:
o The intervals between ranks or points on the scale cannot be assumed to be equal.
Interval:
• Assigning scores using a numerical scale where there is equal distance between each point on the scale (i.e., Degrees Celsius or IQ).
• Scale doesn’t have a true point of zero (zero is not meaningful; communicate the absence of the trait).
• Communicating interval data as if it were a ratio doesn’t make sense (i.e., 20 degrees is twice as hot as 10 degrees is an unsupported statement).
Ratio:
• True point of zero exists and indicates the absence of the trait (i.e., height, weight, number of correct answers on an exam, Kelvin scale, money).
• Accumulates aspects of the other three scales; 1) Nominal by providing the category of each object; 2) Ordinal where objects are ranked; 3) interval where there is equal distance between each point on the scale; 4) ratios at to places on the scale also have equivalent meanings.
o Why are levels of measurement important?
They emphasise the generality of the concept of measurement that there are four different levels with their own features and uses.
They serve as a rough guide on the statistical procedures which can be conducted based on the level of measurement you have and the conclusions you can make.
• Nominal = mode
• Ratio = ratio comments like x2 as big etc.
Reliability:
Reliability refers to the consistency of a measure. Psychologists consider three types of consistency: over time (test-retest reliability), across items (internal consistency), and across different researchers (inter-rater reliability).
- Test-Retest Reliability:
The extent to which measurements are consistent overtime. It’s important to consider whether the trait being studied is stable or unstable (mood is unstable over days but should be stable over a month period, intelligence and self-esteem are expected to be more stable).
Assessing test-retest reliability requires using the measure on a group of people at one time, using it again on the same group of people at a later time, and then looking at test-retest correlation between the two sets of scores. This is typically done by graphing the data in a scatterplot and computing Pearson’s r.
In general, a test-retest correlation of +.80 or greater is considered to indicate good reliability. - Internal Consistency:
The consistency of people’s responses across the items on a multiple-item measure.
In general, all the items on such measures are supposed to reflect the same underlying construct, so people’s scores on those items should be correlated with each other.
If people’s responses to the different items are not correlated with each other, then it would no longer make sense to claim that they are all measuring the same underlying construct.
This is as true for behavioural and physiological measures as for self-report measures.
Like test-retest reliability, internal consistency can only be assessed by collecting and analysing data. One approach is to look at a split-half correlation. This involves splitting the items into two sets, such as the first and second halves of the items or the even- and odd-numbered items. Then a score is computed for each set of items, and the relationship between the two sets of scores is examined. A split-half correlation of +.80 or greater is generally considered good internal consistency.
Most common measure is the Cronbach’s α (the Greek letter alpha) a value of +.80 or greater is generally taken to indicate good internal consistency. - Inter-Rater Reliability:
Many behavioural measures involve significant judgement on the part of an observer or a rater. Inter-rater reliability is the extent to which different observers are consistent in their judgements. Different observers’ ratings should be highly correlated with each other.
Inter-rater reliability is often assessed using Cronbach’s α when the judgements are quantitative or an analogous statistic called Cohen’s κ (the Greek letter kappa) when they are categorical.
Validity:
Validity is the extent to which the scores from a measure represent the variable they are intended to.
Face Validity:
• Face validity is the extent to which a measurement method appears “on its face” to measure the construct of interest.
• Face validity is at best a very weak kind of evidence that a measurement method is measuring what it is supposed to. One reason is that it is based on people’s intuitions about human behaviour, which are frequently wrong.
• It is not the participants’ literal answers to these questions that are of interest, but rather whether the pattern of the participants’ responses to a series of questions matches those of individuals who tend to suppress their aggression.
Content Validity:
• Content validity is the extent to which a measure “covers” the entire construct of interest.
• For example, if a researcher conceptually defines test anxiety as involving both sympathetic nervous system activation (leading to nervous feelings) and negative thoughts, then his measure of test anxiety should include items about both nervous feelings and negative thoughts. Or consider that attitudes are usually defined as involving thoughts, feelings, and actions toward something.
• By this conceptual definition, a person has a positive attitude toward exercise to the extent that they think positive thoughts about exercising, feel good about exercising, and actually exercises So to have good content validity, a measure of people’s attitudes toward exercise would have to reflect all three of these aspects.
• Like face validity, content validity is not usually assessed quantitatively. Instead, it is assessed by carefully checking the measurement method against the conceptual definition of the construct.
Criterion Validity:
• Criterion validity is the extent to which people’s scores on a measure are correlated with other variables (known as criteria) that one would expect them to be correlated with. For example, people’s scores on a new measure of test anxiety should be negatively correlated with their performance on an important school exam.
• A criterion can be any variable that one has reason to think should be correlated with the construct being measured, and there will usually be many of them. For example, one would expect test anxiety scores to be negatively correlated with exam performance and course grades and positively correlated with general anxiety and with blood pressure during an exam.
• When the criterion is measured at the same time as the construct, criterion validity is referred to as concurrent validity; however, when the criterion is measured at some point in the future (after the construct has been measured), it is referred to as predictive validity (because scores on the measure have “predicted” a future outcome).
• Criteria can also include other measures of the same construct. For example, one would expect new measures of test anxiety or physical risk taking to be positively correlated with existing measures of the same constructs. This is known as convergent validity.
Discriminant Validity:
• Discriminant validity, on the other hand, is the extent to which scores on a measure are not correlated with measures of variables that are conceptually distinct.
• For example, self-esteem is a general attitude toward the self that is fairly stable over time. It is not the same as mood, which is how good or bad one happens to be feeling right now. So people’s scores on a new measure of self-esteem should not be very highly correlated with their moods.
Deciding on an Operational Definition:
Using an Existing Measure:
Creating your Own Measure:
Using an Existing Measure:
• The advantages of using an existing measure that has been validated in previous literature are:
you save the time and trouble of creating your own
there is already some evidence that the measure is valid (if it has been used successfully),
your results can more easily be compared with and combined with previous results.
• If you choose to use an existing measure, you may still have to choose among several alternatives. You might choose the most common one, the one with the best evidence of reliability and validity, the one that best measures a particular aspect of a construct that you are interested in, or even the one that would be easiest to use.
• You want to use the full scale from the original citation.
Creating your Own Measure:
• Instead of using an existing measure, you might want to create your own. Perhaps there is no existing measure of the construct you are interested in or existing ones are too difficult or time-consuming to use. Or perhaps you want to use a new measure specifically to see whether it works in the same way as existing measures—that is, to evaluate convergent validity.
• Issues when creating your own behavioural, self-report or physiological scale:
be aware that most new measures in psychology are really variations of existing measures, so you should still look to the research literature for ideas. Perhaps you can modify an existing questionnaire, create a paper-and-pencil version of a measure that is normally computerised (or vice versa), or adapt a measure that has traditionally been used for another purpose.
When you create a new measure, you should strive for simplicity. Remember that your participants are not as interested in your research as you are and that they will vary widely in their ability to understand and carry out whatever task you give them. You should create a set of clear instructions using simple language that you can present in writing or read aloud (or both). It is also a good idea to include one or more practice items so that participants can become familiar with the task, and to build in an opportunity for them to ask questions before continuing. It is also best to keep the measure brief to avoid boring or frustrating your participants to the point that their responses start to become less reliable and valid.
The need for brevity, however, needs to be weighed against the fact that it is nearly always better for a measure to include multiple items rather than a single item. There are two reasons for this. One is a matter of content validity. Multiple items are often required to cover a construct adequately. The other is a matter of reliability. People’s responses to single items can be influenced by all sorts of irrelevant factors—misunderstanding the particular item, a momentary distraction, or a simple error such as checking the wrong response option. But when several responses are summed or averaged, the effects of these irrelevant factors tend to cancel each other out to produce more reliable scores. Remember, however, that multiple items must be structured in a way that allows them to be combined into a single overall score by summing or averaging.
Finally, the very best way to assure yourself that your measure has clear instructions, includes sufficient practice, and is an appropriate length is to test several people. (Family and friends often serve this purpose nicely). Observe them as they complete the task, time them, and ask them afterwards to comment on how easy or difficult it was, whether the instructions were clear, and anything else you might be wondering about.
Evaluating the Measure:
In most research designs, it is not possible to assess test-retest reliability because participants are tested at only one time.
It is also customary to assess internal consistency for any multiple-item measure—usually by reporting Cronbach’s α.
Convergent and discriminant validity can be assessed in various ways. For example, if your study included more than one measure of the same construct or measures of conceptually distinct constructs, then you should look at the correlations among these measures to be sure that they fit your expectations. Note also that a successful experimental manipulation also provides evidence of criterion validity.
Recall that MacDonald and Martineau manipulated participant’s moods by having them think either positive or negative thoughts, and after the manipulation their mood measure showed a distinct difference between the two groups. This simultaneously provided evidence that their mood manipulation worked and that their mood measure was valid.
But what if your newly collected data cast doubt on the reliability or validity of your measure? The short answer is that you have to ask why. It could be that there is something wrong with your measure or how you administered it. It could be that there is something wrong with your conceptual definition. It could be that your experimental manipulation failed. For example, if a mood measure showed no difference between people whom you instructed to think positive versus negative thoughts, maybe it is because the participants did not actually think the thoughts they were supposed to or that the thoughts did not actually affect their moods. In short, it is “back to the drawing board” to revise the measure, revise the conceptual definition, or try a new manipulation.