Midterm 1 Flashcards
explain the scientific method research process
1) ask a question
2) assume a natural cause for the phenomenon
3) consult past research
4) state a testable hypothesis
5) make a conclusion
6) submit report to peer-reviewed journal
explain the method for testing a hypothesis
1) design study to test hypothesis
2) seek ethical approval
3) collect data
4) analyze data
5) revise hypothesis
6) repeat 1-5 (usually a few times)
hypothesis
a possible answer (may be true or untrue) to the question asked
falsifiable hypothesis
- must be specific
- must take risks
- must be testable (genuine test tries to refute rather than confirm)
what is the purpose of the concept of falsifiability?
evaluates the scientific status of a theory
what theory talked about in class is falsifiable?
einstein’s relativity
- took risks
- opportunity to test
what theory talked about in class is not falsifiable?
freud’s psychoanalysis
- no way to revise cause confidence
a hypothesis that is consistent with every possible outcome is essentially useless (should be incompatible with some), what is an example of this and why?
predict that weather tomorrow can be sunny, cloudy, rainy, or snowy –> not specific, takes no risks, always correct
what is the scientific status of a theory based on?
- falsifiability
- refutability
- testability
hypothesis generate models of the world to help us:
1) predict phenomena
2) determine causes of phenomena
3) explain phenomena
4) control phenomena
NOT DESCRIBE
what is important about new data for a hypothesis?
must account for everything that old data does, and provide additional info
how long do hypotheses survive for?
until data which it can’t account for in uncovered
when is a hypothesis unfalsifiable?
- when no empirical evidence in obtainable
- when it’s predictions are irrefutable
- when additional assumptions are introduced after refuted by data
falsifiability in practice
some theories not immediately discarded after contrary evidence is obtained
- revised to improve experimental methods
- useful but not testable (hope that will be testable one day) –> ex: string theory
- can’t be as specific cause of lack of knowledge (ex: neuroscience more complex than physics)
operational definition
a specific description of how a concept will be measured
operationalization
links concepts to data collection
operational variables
quantities of interest that serve as substitutes for measuring concept of interest
- ex: number of smiles to show happiness
what is the purpose of operational definitions?
- allow us to consistently quantify and measure concepts
- communicate ideas to others
what makes a good operational definition?
- reliability
- validity
- absence of bias (ex: external factors)
- cost (ex: low cost)
- practicality (ex: easy to measure)
- objectivity (ex: physical measurement not subjective)
- high acceptance (ex: many others have used)
bias
difference between the measurement made and the “true” value of that variable
reliability and bias must be determined be over ___ measurements.
many
reliability
- reproducibility of repeated measurements
- must be based on concrete observable behaviours
- facilitates consistency across measurements
what is the same as bias
systematic error
what is the same as reliability
precision, consistency
what is the opposite of bias
accuracy, validity
what is the opposite of reliability
variability, random error, noise
theory to prediction path
theory –> hypothesis (maybe many) –> operational definition –> prediction (based on OD)
hypothesis vs. prediction
hypothesis:
- framed as a statement about something (phenomenon) that may or may not be true
- often present tense
- derived from broader theory
prediction:
- conclusion related to specific methodological details of the study
- often future tense
- derived from a more general hypothesis
what is the same as validity
accuracy
validity
- whether measures whats intended to measure (“true value”)
- must be based on relevant behaviours
- facilitates accuracy of measurements
what is the opposite of validity
bias, systematic error
what is the same as variability
random error, noise
what is the opposite of variability
reliability, precision, consistency
variability
how spread out the difference of measurements are
measurement = ?
true score + measurement error (systematic + random)
what are some factors that may contribute to measurement error?
- specificity of operational definition not good enough
- internal noise of measurement device (living or nonliving)
interrater reliability
use multiple raters and compare the extent to which measurements agree
test-retest reliability
test that measures specific quantity (ex: IQ test)
2 types of test-retest reliability
1) same test
2) alternate forms
limitation of test-retest reliability
memory can affect results so won’t reliably measure changes
3 types of internal consistency reliability
1) split-half reliability
2) Cronbach’s alpha
3) item-total
split half reliability
randomly select half of subject and compare with other half –> test if halves are consistent
Cronbach’s alpha
measure how closely related set of items are as a group
item-total
correlate if each item relates to rest of tests/groups –> look at items individually
correlation coefficient (r)
one of the best ways to quantify relationship bw 2 coders
- r > 0 = positive
- r = 0 = no relationship (OD not specific enough)
- r < 0 = negative (something wrong with coders)
indicators of construct validity (how well constructed OD its)
1) face validity
2) content validity
3) predictive validity
4) concurrent validity
5) convergent validity
6) discriminant validity
face validity + example
- degree at which test subjectively (based in what individuals think) covers concept its suppose to measure (looks like measure what should)
- ex: face memory test
content validity + example
- degree at which test measures all things relevant to what its suppose to measure
- ex: autism spectrum quotient (items corresponding to social skills, communication skills, imagination, attention to detail, attention switching)
- almost opp. of face validity
predictive validity + example
- degree at which data accurately predicts a future event based on criterion
- ex: SAT indicate gpa in uni
concurrent validity + example
- degree at which data for a present event correlates to previously validated criterion
- ex: course grade
convergent validity + example
- degree at which two measurements that should be related are
- ex: new data agrees with other(s) in literature with same hypothesis
discriminant validity + example
- degree at which measure is related to another concept it shouldn’t be related to
- ex: new data agrees with other(s) in literature of different hypothesis
predictive/concurrent vs convergent/discriminant
predictive/concurrent
- based on gold standard (well known & agreed upon by many)
convergent/discriminant
- based on other measures (in literature, etc)
variable
any event, situation, behaviour, or individual characteristic that can take more than one value (can change)
divisions on variables
- quantitative
- categorical
quantitative variables + example
- have specific numbers
- ex: measure magnitude
categorical variables + example
- have different levels, not numbers on defined scale
- ex: eye color
how to distinguish between quantitative and categorical variables?
subtraction test –> subtract lower level from higher level
- if differences all have same meaning = quantitative
- if differences have diff meaning = categorical
types of quantitative variables
- discrete
- continuous
how to distinguish between discrete and continuous quantitative variables?
midway test –> take 2 levels and go midway between
- have no meaning = discrete
- still have meaning = continuous
discrete example
number of siblings
continuous example
time
divisions of quantitative variables
- interval
- ratio
interval scale + example
- have equal intervals but no meaningful zero
- ex: IQ
ratio scale + example
- have equal intervals and a meaningful zero (means lack of something)
- ex: speed
divisions of categorical variables
- ordinal
- nominal
ordinal scale + example
- has order
- rank differences don’t need to reflect constant change
- ex: military rank
nominal scale + example
- no particular order
- ex: eye color
Likert scale –> interval or ratio?
- treated as interval when analyzing data (technically ordinal)
- 5-point or 7-point –> but usually self reported
- ex: can’t have zero happiness
positive linear relationship
2 variables change in same direction by set amounts (x up, y up)
negative linear relationship
2 variables change in different direction by set amounts (x up, y down)
curvilinear relationship
still have certain relationship at any given time but direction of relationship not monotonous (positive from 1-5 seconds, negative from 6 - 10 seconds)
no relationship
usually somewhat circular shape, no relation on one variable change to another
linear relationship
variables change by set amount each time
non-linear relationship
variables do not change by set amount each time
monotonic relationship
- overall relationship curve move in same direction (doesn’t matter if linear or not)
- ex: positive non-linear
non-monotonic relationship
- overall relationship changes direction is some places
- ex: curvilinear
non-experimental method
- observations only –> both variables measured
- aka correlational method
experimental method
- at least one variable manipulated (independent), one variable measured (dependent)
non-experimental vs. experimental method –> which one prefer?
experimental method
how to interpret correlation data?
1) correlation is spurious
2) A causes B
3) B causes A
4) third variable causes A and B –> A & B not correlated directly
limitation of non-experimental method
- correlation doesn’t imply causation –> spurious or third variable problem
- directionality problem (A –> B or B –> A?)
spurious correlation
- just a coincidence
- seems to happen when looking at many things –> at least some will happen to have similar patterns
confounding variables
variables intertwined with another independent variable so can’t determine which is operating given situation (alternate explanation)
types of confounding variables
- operational definitions –> ex: poor validity
- participant factors –> ex: social status (based on individuals personal situation)
- order effect –> ex: fatigue, practice (treatment effects)
- group factors –> ex: self-selection
how to minimize confounding variables?
random assignment to conditions –> potential confound likely affect one group as other
internal validity
degree to which all confounding variables have been controlled (how confident can cause be inferred)
limitations of experimental method
plausible alternative explanations need to be eliminated
random assignment
each participant has equal change of being placed into any experimental group/condition
random sampling
randomly choose portion of larger group for doing experiment
descriptive statistics
help us organize, summarize, and describe data –> usually based on samples
inferential statistics
help us generalize from the sample to the population –> based on descriptive stats
what to look at in descriptive stats / frequency distributions?
- shape
- spread (variability)
- outliers (variability)
- central tendency
graphical representations
- pie chart
- bar graph
- histogram
- frequency polygon
benefits of pie chart
- simple and easy to make
- can quickly see data results
limitations of pie charts
hard to include measures of variation –> error bars (therefore not used in research)
the law of large numbers
- as sample size increases, sample statistics become less variable and more closely related to estimate population values
- random phenomena are unpredictable in short term but more predictable in long term
- ex: coin toss (4 tosses vs. 100 tosses); the casino example
plots to use for categorical variables
- pie charts
- bar graph (especially nominal)
plots to use for quantitative variables
- histogram
- frequency polygons
histogram
- shows frequency of occurrence of each score (distribution)
- x axis = each score; y axis = frequency of occurrence
advantage of frequency polygons
easier to visualize 2 series of data compared to histograms
frequency polygons special feature
have connecting lines indicating things are happening between the points
shape of frequency distributions
- symmetry –> symmetric, skewed
- modality –> unimodal, bimodal, multimodal, uniform
symmetric distribution
- can be divided intro two halves that are mirror images of each other
- common
positively skewed distribution
- has score values with low frequencies trailing off towards positive numbers
- central tendency early –> often when values bound by something
- ex: people come in bank when just open
negatively skewed distributions
- has score values with low frequencies that trail off towards negative numbers
- central tendency late
- ex: investment returns
unimodal distribution
has one peak
bimodal distribution
has two peaks (ex: 2 distinct populations)
uniform distribution
- does not have well-defined mode (straight box)
- all value on x axis equally likely to occur
- ex: rolling dice
central tendency
- score of a value that corresponds to the centre of the distribution
- a typical or representative score
purpose of central tendency
- summarize distribution
- allows comparison to other distributions
- used in inferential stat procedures
3 main measures of central tendency
- mean
- median
- mode
central tendency measures: nominal
- mode
central tendency measures: ordinal
- median
- mode
central tendency measures: interval
- mean
- median
- mode
central tendency measures: ratio
- mean
- median
- mode
the mode
the score with the highest frequency
advantages of the mode
can be used with any type of data (including nominal)
limitations of the mode
ignores much of data available
the median
- the middle score with half the measurements before and have the measurements after
- 50th percentile of a distribution
advantages of median
- robust against outliers
- better summary of skewed data
- can be used with ordinal (and everything except nominal)
disadvantages of median
limits use of many statistical tests
the mean
numerical average obtained by summing all scored in distribution and dividing by number of scores
advantages of mean
- easy to obtain & use
- good estimator to represent normal distribution (out of the 3 central tendency measures)
disadvantages of mean
- sensitive to outliers –> poor measure of “central tendency” for highly skewed distributions
- not suitable for nominal or ordinal data
central tendency for unimodal symmetric distribution
mean = median = mode
central tendency for unimodal positively skewed distribution
mode, then median, then mean
mean vs. median vs. mode –> which is preferred
mean
mean vs median vs mode
mean
- interval and ratio data (quantitative)
- symmetric distribution
- enables use of sophisticated statistical tests
median
- ordinal, interval, ratio data
- skewed distribution
- outliers have little effect (so good representation of entire data)
- limits the applicability of statistical tests
mode
- ordinal and nominal data (categorical) –> all data
- multimodal distributions
survey
self-report measure administered through an interview or questionnaire
categories of survey questions
- attitudes & beliefs
- facts & demographic
- behaviours
who created and what was the first survey
- Charles Darwin (1860s)
- questionnaire on emotional expressions to determine if facial expressions are universal
types of surveys
- open-ended
- close-ended
purpose of surveys
- provides method for asking people to tell about themselves
- can be used to study relationships bw variables
- can serve as an important complement to experimental research findings
pros to open-ended surveys
give opportunity to freely say everything individual’s thinking without bias from others
cons to open-ended surveys
difficult to categorize data
pros to closed-ended surveys
easier to code & analyze data
cons to closed-ended surveys
may not give an option that an individual want as an answer –> not super accurate representation
problems with survey question wording
- unnecessary complexity
- vague questions / statements
- loaded / leading questions
- double-barreled questions
- negative wording
- yea-saying and nay-saying
surveys: unnecessary complexity
- unfamiliar technical terms
- phrasing that overloads working memory
surveys: vague questions / statements
- imprecise terms
- ungrammatical sentence structure
surveys: loaded / leading questions
- embedding question with misleading info
- written to bias responses
surveys: double-barreled questions
- asking about two or more things at once
surveys: negative wording
- negative wording (direction in question) –> influence answers a bit
surveys: yea-saying and nay-saying
- hard to distinguish responses to several items in a row asked in same direction from one where participant agrees/disagrees with every item
- pseudo question can help
rating scale types
- likert scale
- graphical scale (happy faces)
problems with surveys
- tendency to answer all questions in particular manner
- “faking good” –> social pressure
- topics too sensitive to talk
solving problems that arise from surveys
assuring privacy, anonymity, confidentiality, etc
features questionnaire should have
- appear attractive and professional
- neatly typed
- error free
- points scales consistent
- ask interesting questions first
- keep as short as possible
how to administer questionnaires
- in person –> groups or individuals
- mail surveys
- internet surveys
- other technologies
benefits of questionnaires
- less costly than interviews
- ensures anonymity
- can reach out to a large number of people
limitations of questionnaires
- no in person clarification of questions
- boredom / distraction can occur
population
a set of individuals of interest to the researcher
confidence interval
- a range of plausible values for the population value
- help allow a more accurate generalization of entire population
- aka margin of error
sampling
- smaller group to rely on when can’t survey everyone
non-probability sampling
- phenomena that are expected to be relatively similar across the population –> will occur for most individuals
- convenience sampling for this
convenience sampling
anyone will do
probability
- phenomena that are expected to vary across the population
- probability sampling for this
probability sampling
sample “representative” of the population in question
types of probability sampling
- simple random sampling
- stratified random sampling
- cluster sampling
simple random sampling
- every member of the population has an equal probability of being selected
- ex: randomly pick 10 winners
stratified random sampling
- population divided into subgroups (strata) and random samples taken from each strata
- prevent small percentages from not getting represented
- ex: pick 7 winners from contact lens wearers and 3 from non-wearers
cluster sampling
- when don’t know ALL individuals relating to the wanted phenomenon, so randomly select some clusters and study all the individuals in belonging to those clusters
- ex: randomly pick a group from 2 decks of cards, randomly pick either spades/hearts/diamonds/clubs, all individual in that groups wins
types of non-probability sampling
- convenience sampling
- purposive sampling
- quota sampling
convenience sampling
- sampling whoever in most convenient
- aka haphazard sampling
- ex: everyone in front
purposive sampling
- sample meets predetermined criterion
- ex: every girl in second row
quota sampling
- sample reflects the numerical composition of various subgroups in the population
- ex: 7 people with glasses and 3 without in third row wins
sample size effect on confidence
- larger sample size reduces size of the confidence interval (increase confidence)
- don’t need to continue increasing sample size as population size increase in order to keep precision
limitations to sample size effects on confidence
must consider cost / benefit of increasing sample size and find a balance
independent groups (“between subjects”) experimental design
- the different participants experience different levels of the independent variable
- each person = only one treatment ever
benefit of independent group experimental design
- avoid order effects
- avoid demand characteristics
- treatments with relatively permanent effects can be done
- similarity to “real-world” permanent effects
limitation of independent group experimental design
- any detected difference between condition may be attributed to individual group differences
- low power –> any true differences may not be detected
repeated measures (“within subjects”) experimental design
- the same participants experience all levels of the independent variable
- each person = try each treatment
benefit of repeated measures experimental design
- reduces measurement error by eliminating random error due to individual differences
- differences among conditions cannot be attributed to participant differences
- greater power (less measurement error) –> fewer participants needed, more likely to detect true differences
limitation of repeated measured experimental design
order effects (can all play role at same time)
- practice effect –> learning and memory
- fatigue effect –> bored or tired
- contrast effect –> what saw before affects now
demand characteristics
within subjects not possible when…
- independent variable is subject variable –> animal study (WT vs transgenic), human studies (ASD vs typical)
- theres order effects –> relatively permanent treatment, like surgery, applied
independent group vs repeated measures –> preference?
repeated measures if all else equal
demand characteristics
any clue within the study that gives participants an idea of what the hypothesis is
how to counteract order effects?
- counterbalancing
- partial counterbalancing
- time interval between conditions
- use independent group design
when can you use counterbalancing?
on any experiment with multiple conditions (multiply levels of IV) –> figure out the number of orders for a experiment
how to use counterbalancing?
randomly split subjects into N! orders –> N = number of independent variables conditions
limitations of counterbalancing
impractical after a certain point (ex: 4!) because NEED at least 1 participant per order
types of partial counterbalancing
- latin square
- random order
- reverse counter-balancing
latin square rules
1) each condition occurs in each position (order) once
2) each condition precedes and follows each condition once
purpose of partial counterbalancing
make things easier than counterbalancing
random order
put each participant that come in in a random order
reverse counter-balancing
obtain one order (ex: ABC), then reverse it (ex: CBA) to get your groups
purpose of time interval between conditions
separate time of tests to allow order effects (ex: fatigue) to have less of an effect
how to create equivalent groups
- random assignment to conditions
- matched pairs
purpose of equivalent groups
should always try to reduce any error
what are matched pairs? examples?
- 2 subjects that are controls for each other
- ex: twins, spouses of patients, gaze contigent displays (eye movement example)
caviat
one can never be sure if matching was 100% effective in creating equivalent groups
latin square design
- N orders for even N, N = # of conditions
- 2N orders for odd N, N = # of conditions
things to consider when conducting studies
- finalizing good study design
- controlling for participant bias
- controlling for experimenter expectations
conducting survey studies
- construct survey (wording, format, privacy)
- recruitment (who, how many, protocol for recruitment)
- data (how to record)
conducting archival studies
- sources of data (from where, who to use)
- data (how to record)
- ex: hockey players data
conducting observational studies
- observe (who, what, where)
- data (how to record)
conducting experimental studies
- prepare experimental material
- recruitment (who, how many, protocol for recruitment)
- jobs of each group member (getting data, etc)
- data (how to record
how to counteract demand characteristics?
- deception
- disguise the dependent measure
- ask participants what they think hypothesis is after (filter ppl)
bistable system + example
- things that can two points of view (depending on the individual) without changing
- ex: direction cube is turning
- not ex: crater / dome illusion (stable to an individual bc of shading)
disadvantages of demand characteristics + example
- can change data obtained
- cube example (when show people a specific direction the cube is turning at specific way of looking at it (with pole) during training, people will most often conform to that direction when asked what they see later on even if they don’t see it as that way (think its the “right answer”); but did not see same results for sound - cause answers remained stable even without the sound cue - so could conclude that data is not solely due to demand characteristics)
how to counteract participant bias?
- placebo effects on placebo group
- alternate between easy and hard tests (for reacting to adaptive procedures)
how to counteract bias from reacting to adaptive procedures?
- right = get harder; wrong = get easier
- tell human will get harder is right, ask to remain calm
- alternate bw easy and hard tests, along with reward
what are the effects of experimenter expectations?
- treat participants in different condition differently
- record / interpret data in different condition differently
how to avoid effects of experimenter expectations?
- repeated measures designs
- automated presentation of condition and recording of data (experimenter doesn’t know participant on which)
- double-blind procedures (as opposed to single blind)
main reasons for using multiple levels of IV?
1) detect non-linear relationship between the IV and DV
2) rule out alternative explanations –> eliminate confounds
examples of detecting non-linear relationships between the IV and DV
- the mozart effect –> hypothesis that listening to classical music improves intelligence (memory performance)
- randomly assign subjects to one of three conditions (no sound, rain (“placebo”), classical music) with latin square design
- participants told that experiment tested relaxation on recall
- did reversed digit span task
- classical was actually not as good as silence (a little better than rain)
- showed order effect has a say (practice effect)
what are complex experimental designs
- IV with 2 or more levels (more than 2 on slides)
OR/and - designs with more than one IV
example of a 2 levels of IV
- caffeine study –> hypothesis that caffeine improves level of alertness
- randomly assign subject to one of two conditions (doses of caffeine)
- alertness increased initially with some caffeine but decreased with more caffeine
factorial design
when a study included more than one single independent variable
what is the simplest factorial design?
- 2x2 –> 4 conditions total
- 2 independent variables
- 2 levels for each independent variable
ways of representing 2x2 factorial design
- chart (4 boxes; factor A vertical and factor B horizontal)
- graph (can choose whats x axis; factor A is x axis then factor B is plotted; steeper = more of an effect of that factor if there is one)
examples of 2x2 factorial design
- candy experiment –> test how peoples eating habits are affected by others
- 2 vs. 30 candies eaten by companion (same companion)
- thin vs. obese companion (same companion)
what do you interpret in factorial designs?
whether or not there is:
- main effects of an independent variable (test both)
- interaction between the independent variables
how to test for main effect of an independent variable?
- calculate marginal means (should get 4 total for 2x2)
- ex: A vertical –> if the vertical marginal means (averages of horizontal lines) are different, theres a main effect of factor A
- ex: B horizontal –> if the horizontal marginal means (averages of vertical lines) are different, theres a main effect of factor B
how to test for interaction between the independent variables?
- find difference of the points of the same column (if A is x axis; use same rows if B is x axis)
- NOTE: difference can be (+) or (-) –> subtract
bottom from top, or right from left - difference same for both (line on graph parallel or same) = no interaction
- difference different (line on graph non parallel or not same) = interaction
how many combinations of main effects and interaction outcomes can a 2x2 factorial design yield?
8
independent variable by participant variable (IV x PV) factorial design
- factorial designs with manipulated and non-manipulated variables
- a mixed factorial design
example of a IV x PV design
- hypothesis that boys are better at math
- 2x2 experiment –> gender (IV) x group (minority or same sex - PV)
- show that boys performed better (main effect of A)
- found that gender stereotype actually influences girl’s performance (main effect of B)
why look for main effect AND interaction?
- the main effect is either qualified or explained by the interaction
- just main effect on its own is not as valuable
consider a 2x3x4 factorial design: how many…
- IVs
- levels of each IV
- DVs
- main effects
- interactions
- IVs = 3 –> 2, 3, 4
- levels of each IV = 2, 3, 4 respectively –> ex: gender, drug doses, quarter of day
- DVs = 1 –> always unless measuring smth else separately
- main effects = 3 –> factor 2, factor 3, factor 4
- interactions = 4 –> 2/3, 2/4, 3/4, 2/3/4
consider a 2x3x4 factorial study: how many conditions does each subject go through for…
- repeated measures
- independent groups
- mixed-factorial
- repeated measures = all 24 conditions –> so 1 group of subjects
- independent groups = 1 condition –> so 24 groups of subjects
- mixed-factorial = 12 conditions –> so 2 groups of subjects
mixed factorial design
have both within subjects (repeated measures) and between subjects (independent) variables
how many orders = ?, how many conditions = ?
factorial of # of conditions (for counterbalancing); multiply the # of levels of IV
confound variable vs. third variable vs. mediating variable
- confound: A or C –> B
- third: C –> A and B
- mediating: A –> C –> B