Exam Revision Flashcards
There are constraints of getting valuable info for hypothesis from study’s design such as - (2)
duration of the study
how many people you can recruit
What is a sample?
A sample is the specific group that you will collect data from.
What is a population?
A population is the entire group that you want to draw conclusions about.
Example of population vs sample (2)
Population : Advertisements for IT jobs in the UK
Sample: The top 50 search results for advertisements for IT jobs in the UK on 1 May 2020
What is inferential statistics?
Inferential statistics allow you to test a hypothesis or assess whether your data is generalisable to the broader population.
Why is there a focus to do parametric tests than others in research? - (3)
- they are more rigorous, powerful and sensitive than non-parametric tests to answer your question
- This means that they have a higher chance of detecting a true effect or difference if it exists.
- They also allow you to make generalizations and predictions about the population based on the sample data.
We can obtain multiple outcomes from the
same people
We can obtain outcomes under
different conditions, groups or both
What are the 4 types of outcomes we measure? (4)
- Ratio
- Interval
- Ordinal
- Nominal
What is a continous variables? - (2)
: there is an infinite number of possible values these variables can take on-
entities get a distinct score
2 examples of continous variables (2)
- Interval
- Ratio
What is an interval variable?
: Equal intervals on the variable represent equal differences in the property being measured
Examples of interval variables - (2)
e.g. the difference between 600ms and 800ms is equivalent to the difference between 1300ms and 1500ms. (reaction time)
temperature (Farenheit), temperature (Celcius), pH, SAT score (200-800), credit score (300-850)
What is ratio variable?
The same as an interval variable and also has a clear definition of 0.0.
Examples of ratio variable - (3)
E.g. Participant height or weight
(can have 0 height or weight)
temp in Kelvin (0.0 Kelvin really does mean “no heat”)
dose amount, reaction rate, flow rate, concentration,
What is a categorical variable? (2)
A variable that cannot take on all values within the limits of the variable
- entities are divided into distinct categories
What are 2 examples of categorical variables? (2)
- Nominal
- Ordinal
What is nominal variable? - (2)
a variable with categories that do not have a natural order or ranking
Has two or more categories
Examples of nominal variable - (2)
genotype, blood type, zip code, gender, race, eye color, political party
e.g. whether someone is an omnivore, vegetarian, vegan, or fruitarian.
What is ordinal variables?
categories have a logical, incremental order
Examples of ordinal variables - (3)
e.g. whether people got a fail, a pass, a merit or a distinction in their exam
socio economic status (“low income”,”middle income”,”high income”),
satisfaction rating [Likert Scale] (“extremely dislike”, “dislike”, “neutral”, “like”, “extremely like”).
Using the term ‘variables’ for continous and categorical variables as - (2)
both outcome and predictor are variables
We will see later on that not only the type of outcome but also type of predictor influences our choice of stats test
Likert scale is ordinal variable but sometimes outcomes measured on likert scale are treated as - (3)
continuous after inspection of the distribution of the data and may argue the divisons on scale are equal
(i.e., treated as interval if distribution is normal)
gives greater sensitivity in parametric tests
What is measurement error?
The discrepancy between the actual value we’re trying to measure, and the number we use to represent that value.
In reducing measurement error in outcomes, the
values have to have the same meaning over time and across situations
Validity means that the (2)
instrument measures what it set out to measure
refers to the accuracy of a measure (whether the results really do represent what they are supposed to measure
Reliability means the
ability of the measure to produce the same results under the same conditions
Test-retest reliability is the ability of a
measure to produce consistent results when the same entities are tested at two different points in time
3 types of variation (3)
- Systematic variation
- Unsystematic variation
- Rnadomisation
What is systematic variation - (2)
Differences in performance created by a specific experimental manipulation.
This is what we want
What is Unsystematic variation (3)
Differences in performance created by unknown factors
.
Age, Gender,** IQ, **Time of day, Measurement error etc.
These differences can be controleld of course (e.g., inclusion/exclusion of pps setting age range of 18-25)
Randomisation (other approaches) minimises - (2)
effects of unsystematic variation
does not remove unsystematic variation
What is the independent variable (Factors)? ( 3)
- The hypothesised cause
- A predictor variable
- A manipulated variable (in experiments)
What is depenedent variable? (measures)- (3)
- The proposed effect , change in DV
- An outcome variable
- Measured not manipulated (in experiments)
In all experiments we have two hypotheses which is (2)
- Null hypothesis
- Alternative hypothesis
What is null hypothesis?
that there is no effect of the predictor variable on the outcome variable
What is alternative hypothesis?
is that this is an effect of the predictor variable on the outcome variable
Null Hypothesis Signifiance Testing computes the probability the (2)
the probability of the null hypothesis being true (referred as p-value) by computing a statistic and how likely it is that the statistic has that value by chance alone
Referred as p-value
The NHST does not compute the probability of the
null hypothesis
There can be directional and non-directional hypothesis of
an alternate hypothesis
non-directional alternate hypothesis is..
The alternative hypothesis is that this is an effect of the group on the outcome variable
Directional alternate hypothesis is…
The alternative hypothesis is that this the mean of the outcome variable for group 1 is larger than the mean of group 2
Example of directional alternate hypothesis
There would be far greater engagment in stats lecture if they were held at 4 PM and not 9AM
For a non-directional hypothesis you will need to divide your alpha value at
two ends of the tail of normal distirbution
The 3 misconceptions of Null Hypothesis Signifiance Testing (NHST) - (3)
- A significant result means the effect is important
- A non-significant result means the null hypothesis is true
- A significant result means the null hypothesis is false (just give probability that data occured given null hypothesis, doesn’t say huge evidence that null hypothesis is categorically false)
P-Hacking and HARKING is another issue with
NHST
p-Hacking and HARKINGS are the - (2)
researchers degrees of freedom
cchange after results are in and some analysis has been done
P-hacking refers to a
selective reporting of significant results
Harking is
Hypothesising After the Results are Known
P-hacking and HARKING are often used in
combination
What does EMBERS stand for? (5)
- Effect Sizes
- Meta-analysis
- Bayesian Estimation
- Registration
- Sense
EMBERS can reduce issues of
NHST
Uses of Effect sizes and Types of Effect Size (3)
- There a quite a few measures of effect size
- Get used to using them and understanding how studies can be compared on the basis of effect size
- A brief example: Cohen’s d
Meaning of Effect Size (2)
Effect size is a quantitative measure of the magnitude of the experimental effect.
The larger the effect size the stronger the relationship between two variables.
Formula of Cohen’s d
What is meta-analysis?
Meta-analysis is a study design used to systematically assess previous research studies to derive conclusions about that body of research
Meta-analysis brings together.. and assesses (2)
- Bringing together multiple studies to get a more realistic idea of the effect
- Can assess effect siz that are averaged across studies
Funnel plots in meta-analysis can be made to….. values stuides…. (2)
investigating publication bias and other bias in meta-analysis
values studies by their sample size and observe bias
Bayesian approaches capture
probabilities of the data given the hypothesis and null hypothesis
Bayes factor is now often computed and stated alongside
conventional NHST analysis (and effect sizes)
Registration is where (5)
- Telling people what you are doing before you do it
- Tell people how you intend to analyze the data
- Largely limits researcher degrees of freedom (HARKING p-hacking)
- A peer reviewed registered study can be published whatever the outcome
- The scientific record is therefore less biased to positive findings
Sense is where (4)
- Knowing what you have done in the context of NHST
- Knowing misconceptions of NHST
- Understanding the outcomes
- Adopting measures to reduce researcher degrees of freedom (like preregistration etc..)
most of the statistical tests in this book rely on
having data measured
at interval level
To say that data are interval, we must be certain that
equal intervals on the scale represent
equal differences in the property being measured.
The distinction between continous and discrete variables can often be blurred - 2 examples- (2)
continuous variables can be measured in discrete terms; we measure age we rarely use nanoseconds but use years (or possibly years and months).
In doing so we turn a continuous variable into a discrete
one
treat discrete variables as if they were continuous, e.g., the number of boyfriends/girlfriends
that you have had is a discrete variable. However, you might read a magazine that says ‘the average number of boyfriends that women in their 20s have has increased from 4.6 to 8.9’
a device for measuring sperm motility that actually measures sperm count is not
valid
Criterion validity is whether the
instrument is measuring what it claims to measure (does
your lecturers’ helpfulness rating scale actually measure lecturers’ helpfulness?).
The two sources of variation that is always present in independent and repeated measures design is
unsystematic variation and systematic variation
effect of our experimental manipulation
is likely to be more apparent in a repeated-measures design than in a
between-group design,
effect of experimental manipulation is more apparent in repeated-design than independent since in independent design,
differences between the characteristics of the people allocated to each of the groups is likely to create considerable random variation both within
each condition and between them
This means that, other things being equal, repeated-measures designs have
more power to
, repeated-measures designs have
more power to d
We can use randomization in two different ways depending on
whether we have an
independent and repeated design measure
Two sources of systematic variation in repeated design measure - (2)
- Practice effects
- Boredom effects
What is practice effects?
Participants may perform differently in the second condition because
of familiarity with the experimental situation and/or the measures being used.
What is boredom effects?
: Participants may perform differently in the second condition because
they are tired or bored from having completed the first condition.
We can ensure no systematic variation between conditions in repeated measure is produced by practice and boredom effects by
counterbalancing the order in which a person participates in a condition
Example of counterbalancing
we randomly determine whether a participant
completes condition 1 before condition 2, or condition 2 before condition 1
*
What distribution is needed for parametric tests?
A normal distribution
The normal distribution curve is also referred as the
bell curve
Normal distribution is symmetrical meaning
This means that the distribution curve can be divided in the middle to produce two equal halves
The bell curve can be described using two parameters called (2)
- Mean (central tendency)
- Standard deviation (dispersion)
μ is
mean
σ is
standard deviation
Diagram shows:
e.g., If we move 1σ to the right then it contains 34.1% of the valeues
Many statistical tests (parametric) cannot be used if the data are not
normally distributed
The mean is the sum of
scores divided by the number of scores
Mean is a good measure of
central tendency for roughly symmetric distributions
The mean can be a misleading measure of central tendency in skewed distributions as
it can be greatly influenced by scores in tail e.g., extreme values
Aside from the mean, what are the 2 other measured of central tendency? - (2)
- Median
- Mode
The median is where (2)
the middle score when scores are ordered.
the middle of a distribution: half the scores are above the median and half are below the median.
The median is relatively unaffected by … and can be used with… (2)
- extreme scores or skewed distribution
- can be used with ordinal, interval and ratio data.
The mode is the most
frequently occurring score in a distribution, a score that actually occurred
The mode is the only measure of central tendency that can be used with
with nominal data
The mode is greatly subject to
sample fluctuations and is therefore not recommended to be used as the only measure of central tendency
Many distributions have more than one
mode
The mean, median and mode are identical in
symmetric distribtions
For positive skewed distribution, the
mean is greater than the median, which is greater than the mode
For negative skewed distribution
usually the mode is greater than the median, which is greater than the mean
Kurtosis in greek means
bulge or bend in greek
What is central tendency?
the tendency for the values of a random variable to cluster round its mean, mode, or median.
Diagram of normal kurotsis, positive excess kurotsis (leptokurtic) and negative excess kurotsis (platykurtic)
What does lepto mean?
prefix meaning thin
What is platy
a prefix meaning flat or wide (think Plateau)
Tests of normality (2)
Kolmogorov-Smirnov test
Shapiro-Wilks test
Tests of normality is dependent on
sample size
If you got a massive sample size then you will find these normality tests often come out as …. even when your data visually can look - (2)
significant
normally disttibuted
If you got a small sample size, then the normality tests may look non-siginificant, even when data is normally distributed, due to
lack of power in the test to detect a significant effect
There is no hard or fast rule for
determining whether data is normally distributed or not
Plot your data because this helps inform on what decisions you want to make with respect to
normality
Even if normality test is sig and data looks visually normally distributed then still do
parametric tests
A frequency distribution or a histogram is a plot of how many times
each score occurs
2 main ways a distribution can deviate from the normal - (2)
- Lack of symmetry (called skew)
- Pointyness (called kurotsis)
In a normal distribution the values of skew and kurtosis are 0 meaning…
tails of the distribution are as they should be
Is age nominal or continous?
Continous
Is gender continous or nominal?
Nominal
Is height continous or nominal?
Continous
Which of the following best describes a confounding variable?
A. A variable that affects the outcome beingmeasured as well as, or instead of, theindependent variable
B. A variable that is manipulated by theexperimenter
C. A variable that has been measured using an unreliable scale
D.A variable that is made up only of categories
A
If a test is valid , what does it mean?
A.The test measures what it claims to measure.
B. The test will give consistent results. (Reliability)
C.The test has internal consistency (measure for correlations between different items on same test = see if it measures same construct)
D. Test measures a useful construct or variable = test can measure something useful but not valid
A
A variable that measures the effect that manipulating another variable has is known as:
A. DV
B. A confounding variable
C. Predictor variable
D. IV
A
The discrepancy between the numbers used to represent something that we are trying tomeasure and the actual value of what we are measuring is called:
A. Measurement error
B. Reliability
C. The ‘fit’ of the model
D. Variance
A
A frequency distribution in which low scores are most frequent (i.e. bars on the graph arehighest on the left hand side) is said to be:
A. Positively skewed
B. Leptokurtic = distribution with positive kurotsis
C. Platykurtic = negative kruotsis
D. Negatively skewed = frequent scores
A
Which of the following is designed to compensate for practice effects?
A. Counterbalancing
B. Repeated measures design = practice effects issue in repeated measures
C. Giving a participants a break between tasks = this compenstates for bordeom effects
D. A control condition = provides reference point
A
Variation due to variables that has not been measured is
A. Unsystematic variation
B. Homogenous variance = assumption variance each population is equal
C. Systematic variation = due to exp manpulation
D. Residual variance = confirms how well regression line constructed fit to actual data
A
Purpose of control condition is to
A. Allow inferences about cause
B. Control for participants’ characteristics = randomisation
C. Show up relationship between predictor variables
D. Rule out tertium quid
A Allow inferences of cause
If the scores on a test have a mean of 26 and a standard deviation of 4, what is the z-score for a score of 18?
A. -2
B. 11
C. 2
D. -1.41
A (18-26) = -8/4 = -2
The standard deviation is the square root of the
A. Variance
B. Coefficient of determination = r squared
C. Sum of squares = sum of squared deviances
D. Range = largest = smallest
A
Complete the following sentence:A large standard deviation (relative to the value of the mean itself
A. Indicate data points are distant from man (i.e., poor fit of data)
B. Indicate the data points are close to mean
C. Indicate that mean is good fit of data
D. Indicate that you should analyse data with parameteric
A
The probability is p = 0.80 that a patient with a certain disease will besuccessfully treated with a new medical treatment. Suppose that thetreatment is used on 40 patients. What is the “expected value” of thenumber of patients who are successfully treated?
A. 32
B. 20
C. 8
D. 40
A = 80% of 40 is 32 (0.80 * 40)
Imagine a test for a certain disease.
Suppose the probability of a positive test result is .95if someone has the disease, but the probability is only .08 that someone has the disease if his or her test result was positive.
A patient receives a positive test, and the doctor tellshim that he is very likely to have the disease. The doctor’s response is:
A. confusion of intervse
B. Law of small numbers
C. Gambler’s fallacy
D. Correct, because test is 95% accurate when someone has the disease = incorrect as doctor based assumption on incorrect inverse proability
A
Which of these variables would be considered not to have met the assumptions ofparametric tests based on the normal distribution?
(Hint many statistical tests rely on data measured on interval level)
A. gender
B. Reaction time
C. Temp
D. Heart rate
A
The test statistics we use to assess a linear model are usually _______ based on thenormal distribution
(Hint: These tests are used when all of the assumptions of a normal distribution havebeen met
A. Parametric
B. Non-parametric
C. Robust
D. Not
A
Which of the following is not an assumption of the general linear model?
A. Dependence
B. Addictivity
C. Linearity
D. Normally distributed residuals
A = independence is an assumption of parametric and not dependence
Looking at the table below, which of the following statements is the most accurate?
Hint: The further the values of skewness and kurtosis are from zero, the more likely it is that thedata are not normally distributed
A. For the number of hours spent practicsing , there is not an issue of kruotsis
B. For level of msucial skill, data are heavily negatively skewed
C. For number of hours spent practicsing there is an issue of kruotsis
D. For the number of hours spent practicsing, the data is fairly positively skewed
A - correct
B. Incorrect as value of skewnessis –0.079, which suggests that the dataare only very slightly negatively skewedbecause the value is close to zero
C. Incorrect as value of kurtosis is0.098, which is fairly close to zero,suggesting that kurtosis was not aproblem for these data
D. Incorrect as value of skewnessfor the number of hours spent practisingis –0.322, suggesting that the data areonly slightly negatively skewed
Diagram of skewness
In SPSS, output if value of skewness is between -1 and 1 then
all good
In SPSS, output if value is below -1 or above 1 then
data is skewed
In SPSS, output if value of skewness is below -1 then
negatively skewed
In SPSS, output if value of skewness is above 1 then
positively skewed
Diagram of lepto kurotic, platykurtic and mesokurtic( normal)
What does kurotsis tell you?
how much our data lies around the ends/tails of our histogram which helps us to identify when outliers may be present in the data.
A distribution with positive kurtosis, so much of the data is in the tails, will be
pointy or leptokurtic
A distribution with negative kurtosis, so the data lies more in the middle, will be more
be more sloped or platykurtic
Kurtosis is the sharpness of the
peak of a frequency-distribution curve
If our Kurtosis value is 0, then the result is a
normal distribution
If kurotsis value in SPSS between -2 and 2 then
all good! = normal distribution
If kurotsis value in SPSS less than -2 then
platykurtic
If kurotsis value is greater than 2 in SPSS then
leptokurtic
Are we good for skewness and kurotsis in this output SPSS?
Good because both the skewness is between -1 and 1 and kurtosis values are between -2 and 2.
Are we good for skewness and kurotsis in this output SPSS?A
Bad because although the skewness is between 1 and -1, we have a problem with kurtosis with a value of 2.68 which is larger than 2 and -2
Correlational research doesn’t allow to rule out the presence of a
third variable = confounding variable
e.g, we find that drownings and ice cream sales are correlated, we conclude that ice cream sales cause drowning. Are we correct? Maybe due to the weather
The tertium quid is a variable that you may not have considered that could be
influencing your results e.g., ice cream and drowning session
How to rule out tertium quid? - (2)
Use of RCTs.
Randomized Controlled Trials allow to even out the confounding variables between the groups
Correlation does not mean
causation
To infer causation,
we need to actively manipulate the variable we are interested in, and control against a group (condition) where this variable was not manipulated.
Correlation does not mean causation as according to Andy
causality between two variables cannot be assumed because there may be other measured or unmeasured variables affecting the results”
Aside from checking of kurotsis and skewness assumptions in data also check if it has
linearity or less commonly additivity
Additivity refrs to the combined
effect of many predictors
What does this diagram show in terms of additivty /linearity? - (5)
There is a a linear effect when the data increases at a steady rate like the graph on the left.
Your cost increases steadily as the number of chocolate bars increases.
The graph on the right shows a non-linear effect when there is not this steady increase rather there is a sharp change in your data.
So you might feel ok if you eat a few chocolate bars but after that the risk of you having a stomach ache increases quite rapidly the more chocolates you eat.
This effect is super important to check or your statistical analysis will be wrong even if your other assumptions are correct because a lot of statistical tests are based on linear models.
Discrepnacy between measurement and actual value in population is .. and not..
measurement error and NOT variance
Measurement error can happen across all psychological experiments from.. to ..
recording instrument failure to human error
What are the 2 types of measurement errors? - (2)
- Systematic
- Random
What is systematic measurement error?
: predictable, typically constant or proportional to the true value and always affect the results of an experiment in a predictable direction
Example of systematic measurement error
for example if I know I am 5ft2 and when I go to get measured I’m told I’m 6ft this is a systematic error and pretty identifiable - these usually happen when there is a problem with your experiment
What is random measurement error?
measurable values being inconsistent when repeated measures of a constant attribute or quantity are taken.
Example of random measurement error
for example my height is 5ft2 when I measure it in the morning but its 5ft when I measure myself in the evening. This is because my measurements were taken at different times so there would be some variability – for those of you who believe you shrink throughout the day.
What is variance?
Average squared deviation of each number from its mean.
Variability is an inherent part of
things being measured and of the measurement process
Diagram of variance formula
In central limit theorem - (2)
states that the sampling distribution of the mean approaches a normal distribution, as the sample size increases. This fact holds especially true for sample sizes over 30.
Therefore, as a sample size increases, the sample mean and standard deviation will be closer in value to the population mean μ and standard deviation σ .
What does histogram look at? - (2)
Frequency of scores
Look at distribution of data, skewness, kurotsis
What does boxplot look at? - (2)
To identify outliers
Shows median rather than mean (good for non-normally distributed data)
What do line graphs are?
simply bar charts with lines instead of bars
Bar charts are a good way to display
display means (and
standard errors)
What do scatterplot illustrates? - (2)
a relationship between two variables, e.g. correlation or regression
Only use regression lines for regressions!
What are matrix scatterplots? - (2)
Particular kind of scatterplot that can be used instead of the 3-D scatterplot
clearer to read
Using data provided how would you summarise skew?
A. The data has an issue with positive skew
B.The data has an issue with negative skew
C.The data is normally distributed
B
What is the median number of bullets shot at a partner by females?
67.00
What descriptive statistics does the red arrow represents?
A. Inter quartile range
B. Median
C. Mean
D. Range
A
What is the mean of males and females SD? - (2)
Males M = 27.29
Females SD = 12.20
What is the respective standard error of mean for femals and males?
3.26 & 3.42
Answering the question: Meets assumption of parametric tests will determine whether our continous data can be tested with
with parametric or non-parametric tests
A normal distribution is a distribution with the same general shape which is a
bell shape
A normal distribution curve is symmetric around
the mean μ
A normal distribution is defined by two parameters - (2)
the mean (μ) and the standard deviation (σ).
Many statistical tests (parametric) cannot be used if the data is not
normally distributed
What does this diagram show? - (2)
μ = 0 is peak of distribution
Block areas under the curve and gives us insight to way data is distributed and certain scores occuring if they belong to normally distribution e.g., 34.1% of values lie one SD below mean
A z score in standard normal distribution will reflect the number of
SD above or below the mean of a particular score is
How to calculate a z score?
Take a value of participant (e.g., 56 years old) and take away mean of distribution (e.g., mean age of class is 23) divided by SD (class like 2)
If a person scored a 70 on a test with a mean of 50 and a standard deviation of 10
Converting the test scores to z scores, an X of 70 would be…
What the result means…. - (2)
a z score of 2 means the original score was 2 standard deviations above the mean
We can convert our z scores to
pecentiles
Example: What is the percentile rank of a person receving a score of 90 on the test? - (3)
Mean - 80
SD = 5
First calculating z score: graph shows that most people scored below 90. Since 90 is 2 standard deviations above the mean z = (90 - 80)/5 = 2
Z score to pecentile can be looked at table that z score of 2 is equivalent to the 97.7th percentle:
The proportion of people scoring below 90 is thus .977 and proportion of people scoring above 90 is 2.3% (1-0.977)
What is the sample mean?
an unbiased estimate of the population mean.
How can we know how that our sample mean estimate is representative of the population mean?
Via computing standard error of mean - smaller SEM the better
Standard deviation is used as a measure of how
representative the mean was of the observed data.
Small standard deviations represented a scenario in which most data points were
most data points were close to the mean
Large standard deviation represented a situation in which data points were
widely spread
from the mean.
How to calculate the standard error of mean?
computed by dividing the standard deviation of the sample by the the square root of the number in the sample
The larger the sample the smaller the - (2)
standard error of the mean
more confident we can be that the sample mean is representative of the population.
The central limit therom proposes that
as samples get large (usually defined as greater than 30), the sampling distribution has a normal distribution with a mean equal to the population mean, SD = SEM
The standard deviation of sample means is known as the
SEM (standard error of the mean)
A different approach to assess accuracy of sample mean as estimate of - population mean, aside from SE, is to - (2)
calculate boundaries and range of values within which we believe the true value of the population mean value will fall.
Such boundaries are called confidence intervals.
Confidence intervals are created by
samples
A 95% confidence intervals is consructed such that
these intervals (created by samples) will contain the population mean
95% Confidence interval for 100 samples (CI constructed for each) would mean
95 of these samples, the confidence intervals we constructed would contain the true value of the mean in the population.
Diagram shows- (4)
- Dots show the means for each sample
- Lines sticking out representing Ci for the sample means
- If there was a vertical line down it represents population mean
- If confidence intervals don’t overlap then it shows significant difference between the sample means
In fact, for a specific confidence interval, the probability that it contains the population value is either - (2)
0 (it does not contain it) or 1 (it does contain it).
You have no way of knowing which it is.
if our sample means were normally distributed with a mean of 0 and a
standard error of 1, then the limits of our confidence interval
would be –1.96 and +1.96 -
95% of z scores fall between
-1.96 and 1.96
Confidence intervals can be constructed for any estimated parameter, not just
μ - mean
. If the mean represents the true mean well, then the confidence interval of that mean should be
small
if the confidence interval is very
wide then the sample mean could be
very different from the true mean, indicating that it
is a bad representation of the population
Remember that the standard error of the mean gets smaller with the number of observations and thus our confidence interval also gets
smaller - make sense as more we measure more certain sample mean close to population mean
Calculating Confidence Intervals for sample means - rearranging in z formula
LB = Mean - (1.96 * SEM)
UB = Mean + (1.96 * SEM)
The standard deviation of SAT verbal scores in a school system is known to be 100. A researcher wishes to estimate the mean SAT score and compute a 95% confidence interval from a random sample of 10 scores.
The 10 scores are: 320, 380, 400, 420, 500, 520, 600, 660, 720, and 780.
Calculate CI
* M - 530
* N = 10
* SEM = 100/ square root of 10 = 31.62
* Value of z for 95% CI is number of SD one must go from mean (in both directions) to contain 0.95 of the scores
* Value of 1.96 was found in z-table
* Since each tail is to contain 0.025 of the scores, you find the values of z for which is 1-0.025 = 0.975 of the socres below
* 95% of z scores lie between -1.96 and +1.96
* Lower limit = 530 - (1.96) (31.62) = 468.02
* Upper limit = 530 + (1.96)(31.62) = 591.98
Think of test statistic capturing
signal/noise
Hypo
A testStatistic for which the frequency of particular values is known (t, F, chi-square) and thus we can calculate the
probability of obtaining a certain value or p value.
To test whether the model fits the data or whether our hypothesis is a good explanation of the data, we compare
systematic variation against unsystematic
If the probability (p-value) less than or equal to the significance level, then
the null hypothesis is rejected; When the null hypothesis is rejected, the outcome is said to be “statistically significant”
If the proabilibty (p-value) is greater than the signifiance leve, the
null hypothesis is not rejected.
What is a type 1 error in terms of variance? - (2)
think the variance accounted for by the model is larger than the one unaccounted for by the model (i.e. there is a statistically significant effect but in reality there isn’t)
Type 1 is a false
positive
What is type II error in temrs of variance?
think there was too much variance unaccounted for by the model (i.e. there is no statistically significant effect but in reality there is)
Type II error is false
negative
Example of Type I and Type II error
Type I and Type II errors are mistakes we can make when testing the
fit of the model
Type 1 errors when we believe there is a geniue effect in
population, when in fact there isn’t.
Acceptable level of type I error is usually
a-level of usually 0.05
Type II error occurs when we believe there is no effect in the
population when, in reality, there is.
Acceptable level of Type II error is probability/-p-value is
β-level (often 0.2)
An effect size is a standardised measure of
the size of the an effect
Properities of effect size (3)
Standardized = comparable across studies
Not (as) reliant on the sample size
Allows people to objectively evaluate the size of observed effect.
Effect Size Measures
r = 0.1, d = 0.2 (small effect):
the effect explains 1% of the total variance.
Effect size measures
r = 0.3, d = 0.5 (medium effect) means
the effect accounts for 9% of the total variance.
Effect size measures
r = 0.5, d = 0.8 (large effect)
effect accounts for 25% of the variance
Beware of the ‘canned’ effect sizes (e.g., r = 0.5, d = 0.8 and rest) since the size of
effect should be placed within the research context.
We should aim to achieve a power of
.8, or an 80% chance of detecting
an effect if one genuinely exists.
When we fail to reject the null hypothesis, it is either that there truly are no difference to be found,
OR
it may be because we do not have enough statistical power
Power is the probability of
correctly rejecting a false H0 OR the ability of the test to find an effect assuming there is one in the population,
Power is calculated by
1 - β OR probability of making Type II error
To increase statistical power of study you can increase
your sample sizee
Factors affecting the power of the test: (4):
- Probability of a type 1 error or a-level [level at which we decide effect is sig - p-value) –> bigger [more lenient] alpha then more power)
- True alternate hypothesis H1 [effect size] (degree of overlap, less means more power) - if you find large effect in lit then better chance of detecting something
- The sampel size [N]) –> bigger the sample, less the noise and more power
- The particular tests to be employed - parametric tests greater power to detect sig effect since more sensitive
How to calculate the number of pps they need for reasonable chance of correctly rejecting null hypothesis?
Sample size calculation at a desired level of power (usually power set to 0.8 in formula)
With power, we can do 2 things - (2)
- Calculate power of test
- Calculate sample size necessary to detect an decent effect size and achieve a certain level of power based on past research
Diagram of Type I error, Type II error, power - (4) and making correct decisions
Type 1 error p = alpha
Type II error p = beta
Accepting null hypothesis which is correct - p = 1- alpha
Accepting alternate hypo which is correct - p = 1 - beta
If there is a less degree of overlap in h0 and h1 then
bigger difference means higher power and and correctly reject the null hypothesis than distributions that overlap more
If distribution between h0 and h1 are narrower then
This means that the overlap in distributions is smaller and the power is therefore greater, but this time because of a smaller standard error of our estimate of the means.
Most people want to assess how many participants they need to test to have a reasonable chance of correctly rejecting the null hypothesis (the Power). This formula shows - (2)
us how.
We usually set the power to 0.8.
What is z scores? - (2)
A measure of variability:
The number of standard deviations from the population mean or a particular data point is
Z-scores are a standardised measure, hence they ignore measurement units
Why should we care about z scores? - (2)
Z-scores allow researchers to calculate the probability of a score occurring within a standard normal distribution
Enables us to compare two scores that are from different samples (which may have different means and standard deviations)
Diagram of finding percentile of Trish
Trish takes a test and gets 25
Mean of the class is 20
SD = 4
25-20/4 = 1.25
Z-score = 1.25
Let’s say Trish takes a test and scores 25 and the mean is 20 You may calculate the z-score to be 1.25 you would use a z-score table to see what percentile they would be in (marked in red) so to read the table you would go down to the value 1.2 and you would go across to 0.05 which totals to 1.25 and you can see about 89.4% of other students performed worse.
Diagram of z score and percentile
Josh takes a different test and gets 1150
Mean of the class is 1000
SD = 150
1150 – 1000/150 = 1.0
Z score = 1.0
Who performed better Trish or Josh?
Trish had z score of 1.25
We would use our table and look down the column to a z-score of 1 and across to the 0.00 column (in purple) and we can see 84.1% of students performed worse than Josh so Trish performed better than Josh.
Diagram of z scores and normal distribution - (3)
68% of scores are within 1 SD of the mean,
95% are within 2 SDs and
99.7% are within 3 SDs.
Whats standard error?
: by taking into account the variability and size of our sample we can estimate how far away from the real population mean our mean is!
If we took infinite samples from the population, 95% of the time the population mean will lie within the
the 95% confidence interval range
What does narrow CI represent?
high statistical power
Wide CIs represent?
low statistical power
Power bring the probability of catching a real effect (as opposed to
missing a real effect – Type II error)
We can never say the null hypothesis is
FALSE (or TRUE).
TheP valueor calculated probability is the estimated probability of us
us finding an effect when the null hypothesis (H0) is true.
p = probability of observing a test statistic at least as a big as the one we have if the
H0 is true
Hence, a significant p value (p <.05) tells us that there is a less than 5% chance of getting a test statistic that is
larger than the one we have found if there were no effect in the population (e.g. the null hypothesis were true)
Statistical signifiance does not equal importance - (2)
p = .049, p = .050 are essentially the same thing- the former is ‘statistically significant’.
Importance is dependent upon the experimental design/aims: e.g., A statistically significant weight increase of 0.1Kg between two adults experimental groups may be less important than the same increase between two groups of babies.
Children can learn a second language faster before the age of 7’. Is this statement:
A. One-tailed
B. A non scientific
C. Two-tailed
D. Null hypothesos
A as one-tailed is directional and two tailed is non-direcitonal
Which of the following is true about a 95% confidence interval of the mean:
A. 95 out of 100 CIs wll contain population mean
B. 95 out of 100 sample means will fall within the limits of the confidence interval.
C. 95% of population means will fall within the limits of the confidence interval.
D. There is a 0.05 probability that the population mean falls within the limits of the confidence interval.
A as If we’d collected 100 samples, calculated the mean and then calculated a confidence interval for that mean, then for 95 of these samples the confidence intervals we constructed would contain the true value of the mean in the population
What does a significant test statistic tell us?
A. That the test statistic is larger than we would expect if there were no effect in the population.
B. There is an important effect.
C. The null hypothesis is false.
D. All of the above.
A and just because test statistic is sig does not mean its important effect
Of what is p the probability?
(Hint: NHST relies on fitting a ‘model’ to the data and then evaluating the probability of this ‘model’ given the assumption that no effect exists.)
A.p is the probability of observing a test statistic at least as big as the one we have if there were no effect in the population (i.e., the null hypothesis were true).
B. p is the probability that the results are due to chance, the probability that the null hypothesis (H0) is true.
C. p is the probability that the results are not due to chance, the probability that the null hypothesis (H0) is false
D. p is the probability that the results would be replicated if the experiment was conducted a second time.
A
A Type I error occurs when:
(Hint: When we use test statistics to tell us about the true state of the world, we’re trying to see whether there is an effect in our population.)
A. We conclude that there is an effect in the population when in fact there is not.
B. We conclude that there is not an effect in the population when in fact there is.
C. We conclude that the test statistic is significant when in fact it is not.
D. The data we have typed into SPSS is different from the data collected.
A as If we use the conventional criterion then the probability of this error is .05 (or 5%) when there is no effect in the population
True or false?
a. Power is the ability of a test to detect an effect given that an effect of a certain size exists in a population.
TRUE
True or False?
We can use power to determine how large a sample is required to detect an effect of a certain size.
TRUE
True or False?
c. Power is linked to the probability of making a Type II error.
TRUE
True or False?
d. The power of a test is the probability that a given test is reliable and valid.
FALSE
What is the relationship between sample size and the standard error of the mean?
(Hint: The law of large numbers applies here: the larger the sample is, the better it will reflect that particular population.)
A. The standard error decreases as the sample size increases.
B. The standard error decreases as the sample size decreases.
C. The standard error is unaffected by the sample size.
D. The standard error increases as the sample size increases.
A The standard error (which is the standard deviation of the distribution of sample means), defined as σ_Χ ̅ =σ/√N, decreases as the sample size (N) increases and vice versa
What is the null hypothesis for the following question: Is there a relationship between heart rate and the number of cups of coffee drunk within the last 4 hours?
A. There will be no relationship between heart rate and the number of cups of coffee drunk within the last 4 hours.
B. People who drink more coffee will have significantly higher heart rates.
C. People who drink more cups of coffee will have significantly lower heart rates.
D. There will be a significant relationship between the number of cups of coffee drunk within the last 4 hours and heart rate
A The null hypothesis is the opposite of the alternative hypothesis and so usually states that an effect is absent
A Type II error occurs when :
(Hint: This would occur when we obtain a small test statistic (perhaps because there is a lot of natural variation between our samples.)
A. We conclude that there is not an effect in the population when in fact there is.
B. We conclude that there is an effect in the population when in fact there is not.
C. We conclude that the test statistic is significant when in fact it is not.
D. The data we have typed into SPSS is different from the data collected.
A A Type II error would occur when we obtain a small test statistic (perhaps because there is a lot of natural variation between our samples)
In general, as the sample size (N) increases:
A. The confidence interval gets narrower.
B. The confidence interval gets wider.
C. The confidence interval is unaffected.
D. The confidence interval becomes less accurate
A
Which of the following best describes the relationship between sample size and significance testing?
(Hint: Remember that test statistics are basically a signal-to-noise ratio, so given that large samples have less ‘noise’ they make it easier to find the ‘signal’.)
A. In large samples even small effects can be deemed ‘significant’.
B. In small samples only small effects will be deemed ‘significant’.
C. Large effects tend to be significant only in small samples.
D. Large effects tend to be significant only in large samples.
A
The assumption of homogeneity of variance is met when:
A. The variances in different groups are approximately equal.
B. The variances in different groups are significantly different.
C. The variance across groups is proportional to the means of those groups.
D. The variance is the same as the interquartile range.
A - To make sure our estimates of the parameters that define our model and significance tests are accurate we have to assume homoscedasticity (also known as homogeneity of variance)
Next, the lecturer was interested in seeing whether males and females reacted differently to the different teaching methods.
Produce a clustered bar graph showing the mean scores of teaching method for males and females.
(HINT: place TeachingMethod on the X axis, Exam Score on the Y axis, and Gender in the ‘Cluster on X’ box. Include 95% confidence intervals in the graph).
Which of the following is the most accurate interpretation of the data?
A.Females performed better than males both the reward and indifferent conditions. Regarding the confidence intervals, there was a large degree of overlap between males and females in all conditions of the teaching method.
B.Males performed better than females in the reward condition, and females performed better than males in the indifferent condition. Regarding the confidence intervals, there was no overlap between males and females across any of the conditions of teaching method.
C.Males performed better than females in all conditions. Regarding the confidence intervals, there was a small degree of overlap between males and females for the reward and indifferent conditions, and a large degree of overlap between males and females for the punish condition.
D.Males performed better than females in the reward condition, and females performed better than males in the indifferent condition. Regarding the confidence intervals, there was a small degree of overlap between males and females for the reward and indifferent conditions, and a large degree of overlap between males and females for the punish condition.
D
Produce a line graph showing the change in mean anxiety scores over the three time points.
NOTE: this is a repeated measures (or within subjects) design, ALL participants took part in the same condition.
Which of the following is the correct interpretation of the data?
A.Mean anxiety increased across the three time points.
BMean anxiety scores were reduced across the three time points, and there was a slight acceleration in this reduction between the middle and end of the course.
CMean anxiety scores were reduced across the three time points, though this reduction slowed down between the middle and end of the course.
DMean anxiety scores did not change across the three time points.
B
A general approach in regression is that our outcomes can be predicted by a model and what remains
is the error
The i in the general model in regression shows
e.g., outcome 1 is equal to model plus error 1 and outcome 2 is equal to model plus error 2 and so on…
For correlation, the outcome is modelled by
scaling (multiplying by a constant) another variable
Equation of correlation model
If you have a 1 continous variable which meets assumtpion of parametric test then you can conduct a
pearson correlation or regression
Variance is a feature of outcome measurements we have obtained and we want to predict with a model in correlation/regression that…
captures the effect of the predictor variables we have manipulated or measured
Variance of a single variable represents the
average amount that the data cary from the mean
Variance is the standard deviation
squared (s squared)
Variance formula - (2)
xi minus average of all scores of pp which is squared and divided by total number of participants minus 1
done for each participant (sigma)
Variance is SD squared meaning that it captures the
average of the squared difference the outcome values from the mean of all outcomes (explaining what the formula of variance does)
Covariance gathers information on whether
one variable covarys with another
In covariance if we are interested whether 2 variables are related then interested whether changes in one variable are met with changes in other
therefore.. - (2)
when one variable deviates from its mean we
would expect the other variable to deviate from its mean in a similar way.
So, if one variable increases then the other, related variable, should also increase or even decrease at the same level.
If one variable covaries with another variable then it means these 2 variables are
related
To get SD from variance then you would
square root variance
What would you do in covariance formula in proper words? - (5)
- Calculate the error between the mean and each subject’s score for the first variable (x).
- Calculate the error between the mean and their score for the second variable (y).
- Multiply these error values.
- Add these values and you get the product deviations.
- The covariance is the average product deviations
Example of calculaitng covariance and what does answer tell you?
The answer ispositive: that tells us the x and y values tend to risetogether.
What does each element of covariance formula stand for? - (5)
X = the value of ‘x’ variable
Y = the value of ‘y’ variable
X(line) = mean of ‘x’ - e.g., green
Y(line) = mean of ‘y’ - e.g., blue
n = the number of items in the data set
covariance will be large when values below
the mean for one variable
What does a positive covariance indicate?
as one variable deviates from the mean, the other
variable deviates in the same direction.
What does negative covariance indicate?
a negative covariance indicates that as one variable deviates from the mean (e.g. increases), the other deviates from the mean in the opposite direction (e.g. decreases).
What is the problem of covariance as a measure of the relationship between 2 variables? - (5)
dependent upon the units /scales of measurement used
So covariance is not a standardised measure
e.g., if 2 variables measured in miles and covariance is 4.25 then if we convert data to kilometres then we have to calculate covariance again and see it increases to 11.
Dependence of scale measurement is a problem as can not compare covariances in an objective way –> can not say whether covariance is large or small to another data unless both data sets measured in same units
So we need to STANDARDISE it.
What is the process of standardisaiton?
To overcome the problem of dependence on the measurement scale, we need to convert
the covariance into a standard set of units
How to standardise the covariance?
dividing by product of the standard deviations of both variables.
Formula of standardising covariance
Same formula of covariance but multipled of SD of x and SD of y
Formula of Pearson’s correlation coefficient, r
Example of calculating Pearson’s correlation coefficient, r - (5)
standard deviation for the number of adverts watched (sx)
was 1.67,
SD of number of packets of crisps bought (sy) was 2.92.
If we multiply these together we get 1.67 × 2.92 =
4.88.
.Now, all we need to do is take the covariance, which we calculated a few pages ago as being 4.25, and divide by these multiplied standard deviations.
This gives us r = 4.25/
4.88 = .87.
The standardised version of covariance is the
correlational coefficient or Pearson’s r
Pearson’s R is … version of covariance meaning independent of units of measurement
standardised
What does correlation describe? - (2)
Describes a relationship between variables
If one variable increases, what happens to the other variable?
Pearson’s correlation coefficient r was also called the
product-moment correlation
Linear relationship and normally disturbed data and interval/ratio and continous data is assumed in
Pearson’s r correlation coefficient
Pearson Correlation Coefficient varies between
-1 and +1 (direction of relationship)
The larger the R Pearson’s correlation coefficient value, the closer the values will
be with each other and the mean
The smaller R Pearson’s correlation coefficient values indicate
there is unexplained variance in the data and results in the data points being more spread out.
What does these two graphs show? - (2)
- example of high negative correlation. The data points are close together and are close to the mean.
- On the other hand, the graph on the right shows a low positive correlation. The data points are more spread out and deviate more from the mean.
The Pearson Correlation Coefficient measures the strength of a relationhip
between one variable and another hence its use in calculating effect size
A Pearson’s correlation coefficient of +1 indicates
two variablesare perfectly positively correlated, so as one variable increases, the other increases by a
proportionate amount.
A Pearson’s correlation coefficient of -1 indicates
a perfect negative relationship: if one variable increases, the other decreases by a proportionate amount.
Pearson’s r
+/- 0.1 means
small effect
Pearson’s r
+/- 0.3 means
medium effect
Pearson’s r
+/- 0.5 means
large effect
In Pearson’s correlation, we can test the hypothesis that - (2)
correlation coefficient is different from zero
(i.e., different from ‘no relationship’)
In Pearson’s correlation coefficient, we can test the hypothesis that the correlation is different from 0
If we find our observed coefficient was very unlikely to happen if there was no effect in population then gain confidence that
relationship that
we have observed is statistically meaningful.
. In the case of a correlation
coefficient we can test the hypothesis that the correlation is different from zero (i.e. different
from ‘no relationship’).
There are 2 ways to test this hypothesis
- Z scores
- T-statistic
Confidence intervals tells us something
likely correlation in the population
Can calculate confidence intervals of Pearson’s correlation coefficient by transforming formula of CI
As sample size increases, so the value of r at which a significant result occurs
decreases e.g 20 n p is not < 0.05 but at 200 pps it is p < 0.05
Pearson’s r = 0 means - (2)
indicates no linear relationship at all
so if one variable changes, the other stays the same.
Correlation coefficients give no indication of direction of… + example - (2)
causality
e.g., although we conclude no of adverts increase nmber of toffees bought we can’t say watching adverts caused us to buy toffees
We have to be caution of causality in terms of Pearson’s correlation r as - (2)
- Third variable problem - causality between variables can not be assumed in any correlation
- Direction of causality: Correlation coefficients give nothing about which variables causes other to change.
If you got weak correlation between 2 variables = weak effect then take a lot of measurements for that relationship to be
significant
R correlation coefficient gives the ratio of
covariance to a measure of variance
Example of correlations getting stronger
R squared is known as the
coefficient of determination
of cor
R^2 can be used to explain the
proportion of the variance for a dependent variable )outcome) that’s explained by an independent variable . (predictor)
Example of R^2 coefficient of determination - (2)
X = exam anxiety
Y = exam performance
If R^2 = 0.194
19.4% of variability in exam performance can be explained by exam anxiety
the variance in y accounted for by x’,
R^2 calculate the amount of shared
variance
Example of r and R^2
Multiply 0.1 * 0.1 for example
R^2 gives you the true strength of.. but without
the correlation but without an indication of its direction.
What are the three types of correlations? - (3)
- Bivarate correlations
- Partial correlations
- Semi-partial or part correlations
Whats bivarate correlation?
relation between 2 variables
What is a partial correlation?
looks at the relationship between two variables while ‘controlling’ the effect of one or more additional variables.
The partial correlation partials out the
the effect of one or more variables on either X or Y
A partial correlation controls for third variable which is made from - (3)
- A correlation calculates each data points distance from line (residuals)
- This is the error relative to the model (unexplained variance)
- A third variable might predict some of that variation in residuals
The partial correlation compares the unique variation of one variable with the
unfiltiered variation of the other
The partial correlation holds the
third variable constant (but we don’t manipulate these)
Example of partial correlation- (2)
For example, when studying the effect of a diet, the level of exercise might also influence weight loss
We want to know the unique effect of diet, so we need to partial out the effect of exercise
Example of Venn Diagram of Partial Correlation - (2)
Partial Correlation between IV1 and DV = D / D+C
Unique variance accounted for by the predictor (IV1) in the DV, after accounting for variance shared with other variables.
Example of Partial Correlation - (2)
Partial correlation: Purple / Red + Purple
If we were doing just a partial correlation, we would see how much exam anxiety is influencing both exam performance and revision time.
Example of partial correlation and semi-partial correlation - (2)
The partial correlation that we calculated took
account not only of the effect of revision on exam performance, but also of the effect of revision on anxiety.
If we were to calculate the semi-partial correlation for the same data, then this would control for only the effect of revision on exam performance (the effect of revision
on exam anxiety is ignored).
In partial correlation, the third variable is typically not considered as the primary independent or dependent variable. Instead, it functions as a
control variable—a variable whose influence is statistically removed or controlled for when examining the relationship between the two primary variables (IV and DV).
The partial correlation is
The amount of variance the variable explains
relative to the amount of variance in the outcome that is left to explain after the contribution of other predictors have been removed from both the predictor and outcome.
These partial correlations can be done when variables are dichotomous (including third variable) e.g., - (2)
we could look at the relationship between bladder relaxation (did the person wet themselves or not?) and the number of large tarantulas crawling up your leg controlling for fear of spiders
(the first variable is dichotomous, but the second variable and ‘controlled for’ variables are continuous).
What does this partial correlation output show?
Revision time = partial, controlling for its effect
Exam performance = DV
Exam anxiety = X - (5)
- . First, notice that the partial correlation between exam performance and exam anxiety is −.247, which is considerably less than the correlation when the effect of
revision time is not controlled for (r = −.441). - . Although this correlation is still statistically significant (its p-value is still below .05), the relationship is diminished.
- value of R2 for the partial correlation is .06, which means that exam anxiety can now account for only 6% of the variance in exam performance.
- When the effects of revision time were not controlled for, exam anxiety shared 19.4% of the variation in exam scores and so the inclusion of revision time has severely diminished the amount of variation in exam scores shared by anxiety.
- As such, a truer measure of the role of exam anxiety has been obtained.
Partial correlations are most useful for looking at the unique
relationship between two variables when
other variables are ruled out
In a semi-partial correlation we control for the
effect that
the third variable has on only one of the variables in the correlation
The semi partial (part) correlation partials out the - (2)
Partials out the effect of one or more variables on either X or Y.
e.g. The amount revision explains exam performance after the contribution of anxiety has been removed from the one variable (usually the predictor- e.g. revision).
The semi-partial correlation compares the
unique variation of one variable with the unfiltered variation of the other.
Diagram of venn diagram of semi-partial correlation - (2)
- Semi-Partial Correlation between IV1 and DV = D / D+C+F+G
Unique variance accounted for by the predictor (IV1) in the DV, after accounting for variance shared with other variables.
Diagram of revision and exam performance and revision time on semi-partial correlation - (2)
- purple/red + purple + white+ orange
- When we use semi-partial correlation to look at this relationship, we partial out the variance accounted for by exam anxiety (the orange bit) and look for the variance explained by revision time (the purple bit).
Summary of partial correlation and semi-partial correlation - (2)
A partial correlation quantifies the relationship between two variables while accounting for the effects of a third variable on both variables in the original correlation.
A semi-partial correlation quantifies the relationship between two variables while accounting for the effects of a third variable on only one of the variables in the original correlation.
Pearson’s product-moment correlation coefficient (described earlier) and Spearman’s rho (see section 6.5.3) are examples of
of bivariate correlation coefficients.
Non-parametric tests of correlations are… (2)
- Spearman’s roh
- Kendall’s tau test
In spearman’s rho the variables are not normally distributed and measures are on a
ordinal scale (e.g., grades)
Spearman’s rho works on by
first ranking the data n(numbers converted into ranks), and then running Pearson’s r on the ranked data
Spearman’s correlation coefficient, rs, is a non-parametric statistic and so can be used when the data have
data have violated parametric assumptions such as nonnormally distributed data
In spearman correlation coefficient is sometimes called
Spearman’s rho
For spearman’s r we can get R squared but it is interpreted slightly different as
proportion of
variance in the ranks that two variables share.
Kendall’s tau used rather than Spearman’s coefficient when - (2)
when you have a small data set with a large number of
tied ranks.
This means that if you rank all of the scores and many scores have the same rank, then Kendall’s tau should be used
Kendall’s tau test - (2)
For small datasets, many tied ranks
Better estimate of correlation in population than Spearman’s ρ
Kendall’s tau is not numerically similar to r or rs (spearman) and so tau squared does not tell us about
proportion of
variance shared by two variables (or the ranks of those two variables).
The Kendall’s tau is 66-75% smaller than both Spearman’s r and Pearson’s r so
tau is not comparable to r and r s
There is a benefit using Kendall’s statistic than Spearman as it shows - (2)
Kendall’s statistic is actually a better estimate of the correlation in the population
we can draw more accurate generalizations from Kendall’s statistic than from Spearman’s.
Whats the decision tree for Spearman’s correlation? - (4)
- What type of measurement = continous
- How many predictor variables = one
- What type of continous variable = continous
- Meets assumption of parametric tests - No
The output of Kendall and Spearman can be interpreted the same way as
Pearson’s correlation coefficient r output box
The biserial and point-biserial correlation coefficients used when
one of the two variables is dichotomous (e.g., example of dichotomous variable is women being pregnant or not)
What is the difference between biserial and point-biserial correlations?
depends on whether the dichotomous variable is discrete or continuous
The point–biserial correlation coefficient (rpb) is used when
one variable is a
discrete dichotomy (e.g. pregnancy),
biserial correlation coefficient (rb) is used
when - (2)
one variable is a continuous dichotomy (e.g. passing or failing an exam).
e.g. An example is passing or failing a statistics test: some people will only just fail while others will fail by
a large margin; likewise some people will scrape a pass while others will clearly excel.
Example of when point=biserial correlation used - (3)
- Imagine interested in relationship between gender of a cat and how much time it spent away from home
- Time spent away is measured in interval level –> mets assumptions of parametric data
- Gender is discrete dichotomous variable coded with 0 for male and 1 for female
Can convert point-biserial correlation coefficient into
biseral correlation coefficient
Point biserial and biserial correlation differ in size as
biserial correlation bigger than point biserial
Example of queston conducting Pearson’s r (4) -
The researchers was interested in whether the amount someone gets paid and amount of holidays they take from work, whether these two variables would be related to their productivity at work
- Pay: Annual salary
- Holiday: Number of holiday days taken
- Productivity: Productivity rating out of 10
Example of Pearson’s r scatterplot :
relationship between pay and productivity
If we have r = 0.313 what effect size is it?
medium effect size
±.1 = small effect
±.3 = medium effect
±.5 = large effect
What does this scatterplot show?
o This indicates very little correlation between the 2 variables
What will a matrix scatterplot show?
the relationship between all possible combinations of your variables
What does this scatterplot matrix show? - (2)
- For Pay and Holiday, we can see the line is very flat and indicates the correlation between the two variables is quite low
- For pay and productivity, the line is steeper suggesting the correlation is fairly substantial between these 2 variables and same for holidays and pay and productivity and holidays here
What is degrees of freedom for correlational analysis?
N-2
What does this Pearson’s correlation r output show? - (4)
- The relationship between pay and holidays is very low correlation is -0.04
- Between pay and productivity, there is a medium size correlation of r = 0.313
- Between holidays and productivity there is medium going on large effect size of 0.435
- Relationship between pay and productivity and also holidays and productivity is sig but correlation with pay and holidays was not sig
Another examp;e of Pearson’s correlation r question - (3)
A student was interested in the relationship between the time spent preparing an essay, the interestingness of the essay topic and the essay mark received.
He got 45 of his friends and asked them to rate, using a scale from 1 to 7, how interesting they thought the essay topic was (1 - I’ll kill myself of boredom, 4 - it’s not too bad!, 7 - it’s the most interesting thing in the world!) (interesting).
He then timed how long they spent writing the essay (hours), and got their percentage score on the essay (essay).
Example of interval/ratio continous data needed for Pearson’s r for IV and DV - (2)
- Interval scale: difference between 10 degrees C and 20 degrees is same as 80 F and 90 F, 0 degrees does not mean absence of temp
- Ratio: Height as 0 cm means no weight and weight, time
Pearson’s correlation r , spearman and kendall equires
one IV and one DV
What does this SPSS output show?
A. There was a non-significant positive correlation between interestingness of topic and the amount of time spent writing. There was a non-significant positive correlation between time spent writing an essay and essay mark
There was a significant positive correlation between interestingness of topic and essay mark, with a medium effect size
B. There was a significant positive correlation between interestingness of topic and the amount of time spent writing, with a small effect size.There was a significant positive correlation between time spent writing an essay and essay mark, with a large effect size. .There was a non-significant positive correlation between interestingness of topic and essay mark
C. There was a significant negative correlation between interestingness of topic and the amount of time spent writing, with a medium effect size.. There was a non-significant positive correlation between time spent writing an essay and essay mark. There was a non-significant positive correlation between interestingness of topic and essay mark
D. There was a significant positive correlation between interestingness of topic and the amount of time spent writing, with a large effect size. There was a non-significant positive correlation between time spent writing an essay and essay mark There was a non-significant positive correlation between interestingness of topic and essay mark
D. There was a significant positive correlation between interestingness of topic and the amount of time spent writing, with a large effect size. There was a non-significant positive correlation between time spent writing an essay and essay mark There was a non-significant positive correlation between interestingness of topic and essay mark
r = 0.21 effect size is..
in between small and medium effect
Effect size is only meaningful if you evaluatte it witth regards to
your own research area
Biserial correlaion is when
one variable is dichotomous, but there is an underlying continuum (e.g. pass/fail on an exam)
Pointt biserial correlation is when
When one variable is dichotomous, and it is a true dichotomy (e.g. pregnancy)
Example of dichotomous relationship - (3)
- example of a true dichotomous relationship.
- We can compare the differences in height between males and females.
- Use dichotomous predictor of gender
What is the decision tree for multiple regression? - (4)
- Continous
- Two or more predictors that are continous
- Multiple regression
- Meets assumptions of parametric tests
Multiple regression is the same as simple linear regression expect for - (2)
every extra predictor you include, you have to add a coefficient;
so, each predictor variable has its own coefficient, and the outcome variable is predicted from a combination of all the variables multiplied by their respective coefficients plus a residual term
Multiple regression equation
In multiple regression equation, list all the terms - (5)
- Y is the outcome variable,
- b1 is the coefficient of the first predictor (X1),
- b2 is the coefficient of the second predictor (X2),
- bn is the coefficient of the nth predictor (Xn),
- εi is the difference between the predicted and the observed value of Y for the ith participant.
Multiple regression uses the same principle as linear regression in a way that
we seek to find the linear combination of predictors that correlate maximally with the outcome variable.
Regression is a way of predicting things that you have not measured by predicting
an outcome variable from one or more predictor variables
Can’t plot a 3D plot of MR as shown here
for more than 2 predictor (X) variables
If you got two prediictors thart overlap and correlate a lot then it is a .. model
bad model can’t uniquely explain the outcome
In Hierarchical regression, we are seeing whether
one model explains significantly more variance than the other
In hierarchical regression predictors are selected based on
past work and the experimenter
decides in which order to enter the predictors into the model
As a general rule for hierarchical regression, - (3)
known predictors (from other research) should be entered into the model first in order of their importance in predicting the outcome.
After known predictors have been entered, the
experimenter can add any new predictors into the model.
New predictors can be entered either all in one go, in a stepwise manner, or hierarchically (such that the new predictor
suspected to be the most important is entered first).
Example of hierarchical regression in terms of album sales - (2)
The first model allows all the shared variance between Ad budget and Album sales to be accounted for.
The second model then only has the option to explain more variance by the unique contribution from the added predictor Plays on the radio.
What is forced entry MR?
method in which all predictors are forced
into the model simultaneously.
Like HR, forced entry MR relies on
good theoretical reasons for including the chosen predictors,
Different from HR, forced entry MR
makes no decision about the order in which variables are entered.
Some researchers believe that about forced entry MR that
this method is the only appropriate method for theory testing because stepwise techniques are influenced by random variation in the data and so rarely give replicable results if the model is retested.
Why select colinearity diagnostics in statistics box for multiple regression? - (2)
This option is for obtaining collinearity statistics such as the
VIF, tolerance,
Checking assumption of no multicolinearity
Multicollinearity poses a problem only for multiple regression because
simple regression requires only one predictor.
Perfect collinearity exists in multiple regression when at least
e.g., two predictors are perfectly correlated , have a correlation coefficient of 1
If there is perfect collinearity in multiple regression between predictors it
becomes impossible
to obtain unique estimates of the regression coefficients because there are an infinite number of combinations of coefficients that would work equally well.
Good news is perfect colinearity in multiple regression is rare in
real-life data
If two predictors are perfectly correlated in multiple regression then the values of b for each variable are
interchangable
As colinearity increases in multiple regression, there are 3 problems that arise - (3)
- Untrustory bs
- Limit size of R
- Importance of predictors
One way of identifying multicollinearity in multiple regression is to scan a
a correlation matrix of all of the predictor
variables and see if any correlate very highly (by very highly I mean correlations of above .80
or .90)
The VIF indicates in multiple regression whether a
predictor has a strong linear relationship with the other predictor(s).
If VIF statistic above 10 or approaching 10 in multiple regression then what you would want to do is have a - (2)
look at your variables to see if you need to include all variables whether all need to go in model
if high correlation between 2 predictors (measuring same thing) then decide whether its important to include both vars or take one out and simplify regression model
Related to the VIF in multiple regression is the tolerance
statistic, which is its
reciporal (1/VIF) = inverse of VIF
In Plots in SPSS, you put in multiple regression - (2)
ZRESID on Y and ZPRED on X
Plot of residuals against predicted to asses homoscedasticity
What is ZPRED in MR? - (2)
(the standardized predicted values of the dependent variable based on the model).
These values are standardized forms of the values predicted by the model.
What is ZRESID in MR? - (2)
(the standardized residuals, or errors).
These values are the standardized differences between the observed data and the values that the model predicts).
SPSS in multiple linear regression gives descriptive outcoems which is - (2)
- basics means and also a table of correlations between variables.
- This is a first opportunity to determine whether there is high correlation between predictors, otherwise known as multi-collinearity
In model summary of SPSS, it captures how the model or models explain in MR
variance in terms of R squared, and more importantly how R squared changes between models and whether those changes are significant.
Diagram of model summary
What is the measure of R^2 in multiple regression
measure of how much of the variability in the outcome is accounted for
by the predictors
The adjusted R^2 gives us an estimate of in multiple regression
fit in the general population
The Durbin-Watson statistic if specificed in multiple regresion tells us whether the - (2)
assumption of independent errors is tenable (value less than 1 or greater than 3 raise alarm bells)
value closer to 2 the better = assumption met
SPSS output for MR = ANOVA table which performs
F-tests for each model
SPSS output for MR contains ANOVA that tests whether the model is
significantly beter at predicting the outcome than using the mean as a ‘best guess’
The F-ratio represents the ratio of
improvement in prediction that results from fitting the model, relative to the inaccuracy that still exists in the model
We are told the sum of squares for model (SSM) - MR regression line in output which represents
improvement in prediction resulting from fitting a regression line to the data rather than using the mean as an estimate of the outcome
We are told residual sum of squares (Residual line) in this MR output which represents
total difference between
the model and the observed data
DF for Sum of squares Model for MR regression line is equal to
number of predictors (e.g., 1 for first model, 3 for second)
DF for Sum of Squares Residual for MR is - (2)
Number of observations (N) minus number of coefficients in regression model
(e.g., M1 has 2 coefficents - one for predictor and one for constant, M2 has 4 - one for each 3 predictor and one for constant)
The average sum of squares in ANOVA table is calculated by
calculated for each term (SSM, SSR) by dividing the SS by the df. T
How is the F ratio calculated in this ANOVA table?
F-ratio is calculated by dividing the average improvement in prediction by the model (MSM) by the average
difference between the model and the observed data (MSR)
If the improvement due to fitting the regression model is much greater than the inaccuracy within the model then value of F will be
greater than 1 and SPSS calculates exact prob (p-value) of obtaining value of F by change
What happens if b values are positive in multiple regression?
there is a positive relationship between the predictor and the outcome,
What happens if the b value is negative in multiple regression?
represents a negative relationship between predictor and outcome variable?
What do the b values in this table tell us what relationships between predictor and outcome variable in multiple regression? (3)
Indicating positive relationships so as advertising budget increases, record sales increases (outcome)
plays on ratio increase as do record sales
attractiveness of band increases record sales
The b-values also tell us, in addition to direction of relationship (pos/neg) , to what degree each in multiple regression
predictor affects the outcome if the effects of all other predictors are held constant:
B-values tell us to what degree each predictor affects the outcome if the effects of all other predictors held constant in multiple regression
e.g., advertising budget - (3)
(b = 0.085):
This value indicates that as advertising budget (x)
increases by one unit, record sales (outcome, y) increase by 0.085 units.
This interpretation is true only if the
effects of attractiveness of the band and airplay are held constant.
Standardised versions of b-values are much more easier to interpret as in muliple regression
not dependent on the units of measurements of variables
The standardised beta values tell us that in multiple regression
the number of standard deviations that the outcome will change as a result of one standard deviation change
in the predictor.
The standardized beta values are all measured in standard deviation
units and so are directly comparable: therefore, they provide a in MR
a better insight into the
‘importance’ of a predictor in the mode
If two predictor variables (e.g., advertising budget and airplay) have virtually identical standardised beta values (0.512, and 0.511) it shows that in MR
both variables have a comparable degree of importance in the model
If we collected 100 samples and in MR calculated CI for b, we are saying that 95% of these CIs of samples would contain the
true (pop) value of b
A good regression model will have a narrow and small CI interval indicating in MR
value of b in this sample is close to the true value of b in the populatio
A bad regression model have CI that cross zero indicating that in MR
in some samples the predictor has a negative
relationship to the outcome whereas in others it has a positive relationship
In image below, which are the two best predictors based on CIs and one that isn’t as (2) in MR
two best predictors (advertising and airplay) have very tight confidence intervals indicating that the estimates for the current model are likely to be representative of the true population values
interval for attractiveness is wider (but still does not cross zero) indicating that the parameter for this variable is less representative, but nevertheless significant.
If you do part and partial correlations in descriptive box, there will be another coefficients table which looks this in MR like:
The zero-order correlations are the simple in MR
Pearson’s correlation coefficients
The partial correlations represent the in MR
represent the relationships between each predictor and the outcome variable, controlling for the effects of the other two predictors.
The part correlations in MR - (2)
represent the relationship between each predictor and the outcome, controlling for the effect that the other two variables have on the outcome.
representing the unique relationship each predictor has with otucome
Partial correlations in example is calculated by in MR- (2)
unique variance in outcome (ignore all other predictors) explained by predictor divided by variance in outcome not explained by all other predictors
A/A+E
Part correlations are calculated by - (2) in MR
unique variance in outcome explained by predictor divided by total variance in outcome
A/A+B+C+E
If the average VIF is substantially greater than 10 then the MR regression
may be biased
MR Tolerance below 0.1 indicates a
serious problem.
Tolerance below 0.2 indicates a in MR
a potential problem
How to interpret this image in terms of colinearity - VIF and tolerance in MR
For our current model the VIF values are all well below 10 and the tolerance statistics all well above 0.2;
therefore, we can safely conclude that there is no collinearity within our data.
We can produce casewise diagnostics to see a in MR to see (2)
summary of residuals statistics to be examined of extreme cases
To see whether individual scores (cases) influence the modelling of data too much
SPSS casewise diagnostics shows cases that have a standardised residuals that are in MR (2)
less than -2 or greater than 2
(We expect about 5% of our cases to do tha and 95% to have standardised residuals within about +/- 2.)
If we have a sample of 200 then expect about .. to have standardised residuals outside limits in MR
10 cases (5% of 200)
What does this casewise diagnostic show? - (2) MR
- 99% of cases should lie within ±2.5 so expect 1% of cases lie outside limits
- From cases listed, clear two cases (1%) lie outside of limits (case, 164 [investigate further has residual 3] and 179) - 1% which isconform to accurate model
If there are many more cases we likely have (more than 5% of sample size) in case wise then in MR
broken the assumptions of the regression
If cases are a large number of standard deviations from the mean, we may want to in casewise diagnostics in MR
investigate and potentially remove them because they are ‘outliers’
Assumptions we need to check for MR - (8)
- Continous outcome variable and continous or dichotomous predictor variables
- Independence = all values of outcome variable should come from different participant
- Non-zero variance as predictors should have some variation in value e.g., variance ≠ 0
- No outliers
- No perfect or high collinearity
- Histogram to check for normality of errors
- Scatterplot of ZRES against ZPRED to check for linearity and homoscedasticity = looking for random scatter
- Independent errors (Durbin-Watson)
Diagram of assumption of homoscedasticity and linearity of ZRESID againsr ZPRED in MR
Obvious outliers on a partial plot represent cases that might have in MR
undue influence on a predictor’s b coefficient
What does this partial plot show? - (2) in MR
the partial plot shows the strong positive relationship to album sales.
There are no obvious outliers and the cloud of dots is evenly spaced out around the line, indicating homoscedasticity.
What does this plot show in MR(2)
the plot again shows a positive relationship to album sales, but the dots show funnelling,
There are no obvious outliers on this plot, but the funnel-shaped cloud indicates a violation of the assumption of homoscedasticity.
P plot and histogram of normally distributed in MR
P plot for skewed distirbution histogram for MR
What if assumptions for regression is volated? in MR
you cannot generalize your findings beyond your sample
If residuals show problems
with heteroscedasticity or non-normality then try to in MR
transforming the raw data – but
this won’t necessarily affect the residuals!
If you have a violation of the linearity assumption then you could see whether you can in MR do l
logistic regression instead
If R^2 is 0.374 (outcome var in productivity and 3 predictors) then it shows that in MR
37.4% of the variance in productivity scores was accounted for by 3 predictor variables
- In ANOVA table, tells whether model is sig improved from baseline model which is in MR
if we assumed no relation between predictor variables and outcome variable – flat regression line no association between these variables)
This table tells us in terms of standardised beta values that (outcome is productivity in MR)
holidays had standardized beta coefficient of 0.031 whereas cake had a much higher standardized beta coefficient of 0.499 which tells us that amount of cake given out much better predictor of productivity than the amount of holidays taken
For pay we have a beta coefficient of 0.323 which tells us that pay was also a pretty good predictor in the model of productivity but slightly less than cake
What does this table tells us in terms of signifiance? - (3) in MR
- P value for holidays is 0.891 which is not significant
- P value for cake is 0.032 is significant
- P value for pay is 0.012 is significant
In ANOVA it is comparing M2 with all its predictor variables with in MR
baseline not M1
To see if M2 is an improvement of M1 in HR we need to look at … in model summary in MR
change statistics
What does this change statistic show in terms of M2 and M1 in MR
M2 explains an extra 7.5% which is sig
In MR, the smaller value of sig, the larger value of t the greater
contribution of that predictor.
For this output interpret whether predicotrs are sig predictors of record scales and magnitude t statistic on impact of record sales in MR - (2)
For this model, the advertising budget (t(196) = 12.26, p < .001), the amount of radio play prior to release (t(196) = 12.12, p < .001) and attractiveness of the band (t(196) =4.55, p < .001) are all significant predictors of record sales.
From the magnitude of the t-statistics we can see that the advertising budget and radio play had a similar impact,
whereas the attractiveness of the band had less impact.
What is example of contintous variable?
we are talking about a variable with a infinante number of real numbers within a given interval so something like height or age
What is an example of dichotomous variable?
variable that can only hold two distinct values like male and female
If outliers are present in data then impact the
line of best fit in MR
You would expect that 1% of cases to lie outside the line of best fit so in large sample if you have in MR
one or two outliers then could be okay
Rule of thumb to check for outliers is to check if there are any data points that in MR
are over 3 SD from the mean
All residuals should lie within ….. SDs for no outliers /normal amount of outliers in MR
-3 and 3 SD
Which variables (if any) are highly correlated in MR?
Weight, Activity, and the interaction between them are statistically significant
What does homoscedasticity and hetrodasticity mean in MR? - (2)
Homoscedasticity: similar variance of residuals (errors) across the variable continuum, e.g. equally accurate.
Heteroscedasticity: variance of residuals (errors) differs across the variable continuum, e.g. not equally accurate
P plot plots a normal distribution against
your distribution
Diagram of normal, skewed to left (pos) and skewed to right (neg) of p-plots in MR
Durbin-Watson test values of 0,2,4 show that… in MR- (3)
- 0 = errors between pairs of obsers are pos correl
- 2 = independent error
- 4 = errors between pairs of observs are neg correl
A Durbin-Watson statistic between … and … is considered to indicate that the data is not cause for concern = independent errors in MR
1.5 and 2.5
If R2 and adjusted R2 are similar, it means that your regression model
‘generalizes’ to the entire population.
If R2 and adjusted R2 are similar, it means that your regression model ‘generalizes’ to the entire population.
Particularly for MR
for small N and where results are to be generalized use the adjusted R2
3 types of multiple regression - (3)
- Standard: To assess impact of all predictor variables simultaneously
- Hierarchical: To test predictor variables in a specific order based on hypotheses derived from theory
- Stepwise: If the goal is accurate statistical prediction from a large number of predictor variables – computer driven
Diagram of excluded variables table in SPSS - (3) in MR
- Tells that OCD interpretiotn of intrustrions would have not have a significant impact on model’s ability to predict social anxiety
Beta value of Interpretation of Intrusions is very small, indicating small influence on outcome variable
Beta is the degree of change in the outcome variable for every 1 unit of change in the predictor variable.
What is multicollinearity in MR
When predictor variables correlate very highly with each other
When checking assumption fo regression, what does this graph tell you in MR
Normality of residuals
Which of the following statements about the t-statistic in regression is not true?
The t-statistic is equal to the regression coefficient divided by its standard deviation
The t-statistic tests whether the regression coefficient, b, is significantly different from 0
The t-statistic provides some idea of how well a predictor predicts the outcome variable
The t-statistic can be used to see whether a predictor variables makes a statistically significant contribution to the regression model
The t-statistic is equal to the regression coefficient divided by its standard deviation
A consumer researcher was interested in what factors influence people’s fear responses to horror films. She measured gender and how much a person is prone to believe in things that are not real (fantasy proneness). Fear responses were measured too. In this table, what does the value 847.685 represent in MR
The residual error in the prediction of fear scores when both gender and fantasy proneness are included as predictors in the model.
A psychologist was interested in whether the amount of news people watch predicts how depressed they are. In this table, what does the value 3.030 represent in MR
The improvement in the prediction of depression by fitting the model
A consumer researcher was interested in what factors influence people’s fear responses to horror films. She measured gender (0 = female, 1 = male) and how much a person is prone to believe in things that are not real (fantasy proneness) on a scale from 0 to 4 (0 = not at all fantasy prone, 4 = very fantasy prone). Fear responses were measured on a scale from 0 (not at all scared) to 15 (the most scared I have ever felt).
Based on the information from model 2 in the table, what is the likely population value of the parameter describing the relationship between gender and fear in MR
Somewhere between −3.369 and −0.517
A consumer researcher was interested in what factors influence people’s fear responses to horror films. She measured gender (0 = female, 1 = male) and how much a person is prone to believe in things that are not real (fantasy proneness) on a scale from 0 to 4 (0 = not at all fantasy prone, 4 = very fantasy prone). Fear responses were measured on a scale from 0 (not at all scared) to 15 (the most scared I have ever felt).
How much variance (as a percentage) in fear is shared by gender and fantasy proneness in the population in MR
13.5%
Recent research has shown that lecturers are among the most stressed workers. A researcher wanted to know exactly what it was about being a lecturer that created this stress and subsequent burnout. She recruited 75 lecturers and administered several questionnaires that measured: Burnout (high score = burnt out), Perceived Control (high score = low perceived control), Coping Ability (high score = low ability to cope with stress), Stress from Teaching (high score = teaching creates a lot of stress for the person), Stress from Research (high score = research creates a lot of stress for the person), and Stress from Providing Pastoral Care (high score = providing pastoral care creates a lot of stress for the person). The outcome of interest was burnout, and Cooper’s (1988) model of stress indicates that perceived control and coping style are important predictors of this variable. The remaining predictors were measured to see the unique contribution of different aspects of a lecturer’s work to their burnout.
Which of the predictor variables does not predict burnout in MR
Stress from research
Using the information from model 3, how would you interpret the beta value for ‘stress from teaching’ in MR
As stress from teaching increases by one unit, burnout decreases by 0.36 of a unit.
How much variance in burnout does the final model explain for the sample in MR
80.3%
A psychologist was interested in predicting how depressed people are from the amount of news they watch. Based on the output, do you think the psychologist will end up with a model that can be generalized beyond the sample?
No, because the errors show heteroscedasticity.
Diagram of no outliers for one assumption of MR
Note that you expect 1% of cases to lie outside this area so in a large sample, if you have one or two, that could be ok
Example of multiple regression - (3)
A record company boss was interested in predicting album sales from advertising.
Data
200 different album releases
Outcome variable:
Sales (CDs and Downloads) in the week after release
Predictor variables
The amount (in £s) spent promoting the album before release
Number of plays on the radio
R is the correlation between
observed values of the outcome, and the values predicted by the model.
Output diagram what does output show in MR? - (2)
Difference between no predictors and model 1 (a).
Difference between model 1 (a) and model 2 (b).
Our model 2 is significantly better at predicting the value of the outcome variable than the null model and model 1 (F (2, 197) = 167.2, p<.001) and explains 66% of the variance in our data (R2=.66
What does this output show in terms of regression model in MR? - (3)
y = 0.09x1 + 3.59x2 + 41.12
For every £1,000 increase in advertising budget there is an increase of 87 record sales (B = 0.09, t = 11.99, p<.001).
For every number of plays on Radio 1 per week there is an increase of 3,589 record sales (B = 3.59, t = 12.51, p<.001).
Report R^2, F statistic and p-value to 2DP for overall model - (3)
o R squared = 0.09
o F statistic = 22.54
o P value = p < 0.001
Report beta and b values for video games, resitrctions and parental aggression to 2DP and p-value in MR
Which of the following assumptions of homosecdasticity and linearity is correct?
AThere is non-linearity in the data
BThere is heteroscedasticity in the data
CThere is both heteroscedasticity and non-linearity in the data
DThere are no problems with either heteroscedasticity or non-linearity
D - data poiints show random pattern
Determine the proportion of variance in salary that the number of years spent modelling uniquely explains once the models’ age was taken into account:
Hierarchical regression
A 2.0%
b17.8%
c39.7%
d42.2%
A –>The R square change in step 2 was .020,
Test for multicollinearity (select tolerance and VIF statistics).
Based on this information, what can you conclude about the suitability of your regression model?
AThe VIF statistic is above 10 and the tolerance statistic is below 0.2, indicating that there is no multicolinearity.
BThe VIF statistic is above 10 and the tolerance statistic is below 0.2, indicating that there is a potential problem withmulticolinearity.
CThe VIF statistic is below 10 and the tolerance statistic is above 0.2, indicating that there is nomulticolinearity.
DThe VIF statistic is below 10 and the tolerance statistic is above 0.2, indicating that there is a potential problem withmulticolinearity.
B
Example of question using hierarchical regression - (2)
A fashion student was interested in factors that predicted the salaries of catwalk models. He collected data from 231 models. For each model he asked how much they earned per day (salary), their age (age), and how many years they had worked as a model (years_modelling).
The student wanted to know if the number of years spent modelling predicted the models’ salary after the models’ age was taken into account.
The following graph shows:
A. Regression assumptions met
B. Non-linearity = could indicate curve
C. Hetrodasticity + Non-linearity
D. hetrodasticity
A
A consumer researcher was interested in what factors influence people’s fear responses to horror films. She measured gender (0 = female, 1 = male) and how much a person is prone to believe in things that are not real (fantasy proneness) on a scale from 0 to 4 (0 = not at all fantasy prone, 4 = very fantasy prone). Fear responses were measured on a scale from 0 (not at all scared) to 15 (the most scared I have ever felt). What is the likely population value of the parameter describing the relationship between gender and fear?
Somewhere between 3.369 and 0.517
What are the 3 types of t-tests? - (3)
- One-samples t-test
- Paired t-test
- Independent t-test
Whats a one-sample t-test?
Compares the mean of the sample data to a known value
What is the assumptions of one-sample t-test? - (4)
- DV = Continous (interval or ratio)
- Independent scores (no relation between scores on test variable)
- Normal distribution via frequency histogram (normal shape) and Q-plot (straight line) and non significant Shaprio Wilk
- Homogenity of variances
Example of one-sample t-test RQ - (2)
Is the average IQ of Psychology students higher than that of the general population (100)?
A particular factory’s machines are supposed to fill bottles with 150 millilitres of product. A plant manager wants to test a random sample of bottles to ensure that the machines are not under- or over-filling the bottles.
What is the assumptions of independent samples t-tests (listing all of them) - (7)
- Independence. – no relationship between the groups
- Normal distribution via frequency histogram (normal shape) and Q-plot (straight line) and non significant Shaprio Wilk
- Equal variances
- Homogeneity of variances (i.e., variances approximately equal across groups) via non significant Levene’s test
- DV = Interval or continuous
- IV = Categorical
- No significant outliers
What is an RQ example of independent samples t-tesT?
Do dog owners in the country spend more time walking their
dogs than dog owners in the city?
What is the assumptions of paired t-test? (listing all) - 3
DV is continuous
Related samples: The subjects in each sample, or group, are the same. This means that the subjects in the first group are also in the second group
Normal distribution via frequency histogram (normal shape) and Q-plot (straight line) and non significant Shaprio Wilk
What is an example of RQ of paired t-test?
Do cats learn more tricks when given food or praise as positive feedback?
What is the decision framework for choosing a paired-sample (dependent) t-test? - (5)
- What sort of measurement = continous
- How many predictor variables = one
- What type of predictor variables = categorical
- How many levels of categorical predictor = two
- Same or different participants for each predictor level = same
What is the decision framework for choosing independent-t-test? (5)
- What sort of measurement = continous
- How many predictor variables = one
- What type of predictor variables = categorical
- How many levels of categorical predictor = two
- Same or different participants for each predictor level = different
If we are comparing differences between means of two groups in independent/paired t-test then all we are doing is
predicting an outcome based on membership of two groups
Indepdnent and paired t-tests can fit into an ideal of a
linear model
The t-distributed is defined by its
degrees of freedom - related to the sample size.
The t distribution has heavier tails for - (2)
lower degrees of freedom (small N studies)
increased uncertainty and a higher likelihood of observing extreme values than large N studies with less heavy tails as t distribution goes to normal
Independent and Paired T-tests have one predictor (X) variable with 2 levels and only …. outcome variable (Y)
one
When is an independent-means t-test used?
When 2 experimental conditions and different participants are assigned to each conditiont
What is independent-means t-test sometimes called as well?
independent-samples t-test
When is a dependent-means t-test used?
Used when there are 2 experimental conditions and same participants took part in both conditions of the experiment
What is dependent-means t-test sometimes referred to?
Matched pairs or paired samples t-test
For independent and paired t-tests we compare between the sample means that we collected to the difference between sample means that we would expect if
there was no effect (i.e., null hypothesis was true)
Formula of calculating t- test statistic (form depend on whether same or different participants used in each experimental condition) in independent/paired
Formula of calculating t-statistic shows obtaining t-test statistic by diving the model/effect by the in independent /apried
error in the model
Expected difference in calculating t-test statistic in most cases is
0 - expect differences between sample group means we colelcted to be different to 0
If observed difference between sample means get larger in t-tests then more confident we become that
null hypothesis is rejected and two sample means differ because of experimental manipulation
Both independent t-test and paired t-test are … tests based on normal distribution
parametric tests
Since independent and paired t-tests are parametric tests they assume that the - (2)
- Sampling distribution is normally distributed - in paired it means sampling distribution of differences of scores is normal not the socres itself!
- Data measured at least interval level
Since independent-tests used to test different groups of people it also assumes - (2)
- Variances in populations are roughly equal (homegenity of variance) = Leven’s test
- Scores are independent since they come from different people
Diagram of equation of calculating t-statistic from paired t-test and explain - (2)
- Compares mean differences betwen our samples (–D) to the differences we would expect to find between population means (uD) which is divided by standard error of differences (sD / square root N)
- If H0 is ture, then expect no difference between population means hence uD = 0
A small standard error of differences tells us that in paired-t-test
pairs of samples from a population have similar means to population
A large standard error of differences tells us that in paired t-test - (2)
that sample means can deviate quite a lot from the populatio mean and
sampling distribution of differences is more spread out
The average difference between person’s socre in condition 1 and condition 2 -(¯D) in paired t-test is an indicator of
systematic variation in the data (represents experimental effect)
If average differences (–D) between our samples is large and standard error of differences is small in paired-t test then we can be confident that
the difference we observed in our sample is not a chance result and caused by experimental manipulation
How do we normally calculate the standard error?
SD divided by square root of sample size
How to calcuate the standard error of differences in paired-test?(σ –D)
Standard deviation of differences divided by square root of sample size
the t-statistic in paired t-test is
ratio of systematic variation in experiment (average difference D) and unsystematic variation (standard erro of differences)
When would we expect t statistic greater than 1 in paired-t-test equation?
If the experimental manipulation creates any kind of effect,
When would we expect t statistic less than 1 in paired t-test equation?
If the experimental manipulation is unsuccessful then we might expect the variation caused by individual differences to be much greater than that caused by the
experiment
In pairered and generally independent t-tests we can compare the obtainee value of t against thmaximum value we would expect to get by chance alone in t distribution with same DFs and if value we obtain exceeds the
critical value then conflict if reflects an effect in our IV
What does this paired samples correlation show?
people doing well in first exam likely doing well in second exam regardless of condition they are in and significantly correlated (r= 0.664)
What does this SPSS output show? = paired t- test
t(19) = 2.72, p = 0.012
What does negative t-value mean?
paired t-test.
First condition had smaller mean than second condition
What does 95% confiderence interval of difference mean in SPSS output of paired t-test?- (3)
- 95% of the samples (e.g., if we had 100 samples then 95 of those samples..) the constructucted CIs contain true value (population) of the mean difference
- CIs tell us boundaries within which true mean difference is likely to lie
- The true value of mean difference is unlikely to be 0 if Cis does not contain 0
How to calculate effect size for independent and paired t-tests?
Using cohen’s D
Diagram of calculating Cohen’s D Statistic for sleep vs no sleep for paired
Minus big mean from small mean divided by smallest SD (control group)
What does Cohen’s d of 0.20 represent
difference between groups is a 1/5 of SD
Diagram of writing up paired t-test result
To calculate effect size for independent and paired t-tests, beside Cohen’s D, we can also
calculate effect size r (above 0.50 is large effect) by converting t-value to r-value
With independent t-test there are two different equations that can be used depending on whether the samples
contain an equal number of people
With independent t-test since different participants participate in different condition, the pairs of scores will differ not just of experimental manipulation but also because of
other sources of variance (such as individual differences between participants’ motivation, IQ etc..)
With dependent t-test we look at differences between pairs of scores because
scores came from same participant and so individual differences were eliminated
Equation of independent t-test of equal N sizes for each condition
Equation of independent t-test of equal N sizes becomes like the final form since - (3)
- We are looking at differences between the overall means of 2 samples and compare with differences we would expect to get between means of 2 populations from which sampels come from
- If H0 is true, samples drawn from same population
- Therefore under H0, u1 = u2 therefore u1 - u2 = 0
Equation of independent t-test in numbers for equal N sizes
We use variance of sum law to obtain the estimate of standard error for each … in independent t-test equation for equal N sizes
sample group
What does variance sum of law state?
variance of the sampling distribution is equal to the sum of the variances of the two populations from which the samples
were taken
This independent t-test standard error formula combines the
standard error for two samples
In independent t-test when we want to compare two groups that contain different number of participants then equation … is not appropriate
For comparing two groups with unequal number of participants in independent t-test then we use the
pooled variance estimate t-test
The pooled variance estimate t-test is used which takes into account of the
differnece in sample size by weighting the variance of each sample
Formula of pooled variance estimate t-test - (2)
Each variance of sample is multipled by its DF and added together and divided by the sum of weights (sum of two DFs)
Larger samples better than small ones as close to population
In formula of pooled variance estimate t-test it weights the variance of each sample by the
number of degrees of freedom (N-1)
As with dependent t-test we compare obtained value of t in independent sample against the
maximum value we would expect to get by chance alone in t distribution with same DFs
What does this output show? - in independent t-test - (2)
Sleep condition scored an average exam score of 66.200 and no sleep condition earned an average of 58.73
Effect size (Cohen’s D) = Mean of sleep minus mean of no sleep divided by standard deviation of sleep (control grp) = 66.20-58/73/7.12
In independent samples t-test we check for Levene’s test for quality of variances which determine whether
we got equal variance across the groups or whether the variances are unequal
In independent t-test, Levene’s test we are looking for a non-significant p-value which shows that
no statistically significant difference in variances between the two groups - report results from equal variances assumed
In independent t-test if Levene’s test was significant then it means that
variances between the 2 groups are different and they are statistically significantly different - report data from equal variances not assumed
What does this output show in independent t-test? - (2)
- Levene’s test is not significant (p = 0.970) so no stats sig differences in variance between two groups
- t(28) = 2.87, p = 0.008
Diagram of reporting independent t-test
Paired vs independent t-tests - who has better power?
Paired t-t ests
Since paired-t-tests use same participants across conditions the … is reduced dramatically compared to independent t-test
unsystematic variance
The non-parametric counterpart of dependent t-test is called
Wilcoxon signed rank test
The non-parametric tests of the independent t-test is
Wilcoxon rank sum test and Mank Whitney test
What does this SPSS output of independent-test of Levene’s show
homogeneity of variance as assessed by Levene’s Test for Equality of Variances (F = 1.58, p = .219)
Cohen’s d for diet was 4.25
Is this a:
Small effect
Medium effect
Large effect
Large effect
The probability of a value of t occurring yields the p value for the difference between the means occurring by
chance
Although there are 2 ways to compute effect, use
Cohen D
Another example of two samples independent t-test scenario
RQ
Sample
DV
Hypothesis
Test
Sig
- (6)
Research question: Which of the two diet formulas is better for puppies?
Sample: 15 were randomly assigned to each of the two diets (A and B).
Dependent variable: Average daily weight gain (ADG, g/day) between 12 to 28 weeks of age.
Hypotheses:
Ho: µA = µB
Ha: µA ≠ µB.
Statistical Test: Two samples
independent t-test
Significance level: .05
We can check if there is no outliers in independent t-test by looking at
boxplots - no outlier here
To check normality of distribution for both independent groups for two-samples independent t-test, we can use..
histogram, q-qplot and tests of normality
Checking normality for independent, is it - (3)
Research question: Which of the two diet formulas is better for puppies?
Dependent variable: Average daily weight gain (ADG, g/day)
We don’t have sig values for either group in the test of normality, histogram and plots look normal
So we have normality of distribution for both independent groups
Inspection of Q-Q Plots and the non-significant Shapiro-Wilk tests (p > .05) indicate that the ADG is normally distributed for both groups
For checking homogeneity of variances in independent/paired we use
levene’s test
Checking homogenity of variance in this two-sample independent t-test, what does it show?
Research question: Which of the two diet formulas is better for puppies?
Dependent variable: Average daily weight gain (ADG, g/day)
was homogeneity of variance as assessed by Levene’s Test for Equality of Variances (F = 1.58, p = .219)
What does results of two-sample independent results show?
Research question: Which of the two diet formulas is better for puppies?
Dependent variable: Average daily weight gain (ADG, g/day)
This study found that puppies in diet B had statistically significantly higher average daily weight gain (89.29 ± 9.93 g/day) between 12 and 28 weeks of age compared to puppies in diet A (60.20 ± 6.85 g/day), t(27)= -9.24, p < .001.
In Cohen’s D theoretically 3 SDs can be used - (3) which make very little difference
- Pooled SD (over conditions)
2.Averaged SD - Control group SD
To calculate Cohen D for independent/paired t-test we need to use
control group SD
How to calculate Cohen’s D for independent two samples t test for this group?
Research question: Which of the two diet formulas is better for puppies?
Dependent variable: Average daily weight gain (ADG, g/day) - (2)
d = (89.29 - 60.20) / 6.85
d = 4.25
Cohen’s D guidelines for small, medium large - (3)
d = 0.2 be considered a ‘small’ effect size,
d = 0.5 represents a ‘medium’ effect size
d = 0.8 a ‘large’ effect size
What does ANOVA stand for?
Analysis of Variance
What
What is the decision tree for choosing a one-way ANOVA? - (5)
Q: What sort of measurement? A: Continuous
Q:How many predictor variables? A: One
Q: What type of predictor variable? A: Categorical
Q: How many levels of the categorical predictor? A: More than two
Q: Same or Different participants for each predictor level? A: Different
When does ANOVA be used?
if you are comparing more than 2 groups in IV
Example of ANOVA RQ
Which is the fastest animal in a maze experiment - cats, dogs or rats?
We can’t do three separate t-tests for example what is the fastest animal in a maze experiment - cats, dogs or rats as - (2)
Doing separate t-tests inflates the type I error (false positive - e.g., pregnant man)
The repetition of the multiple tests adds multiple chances of error, which may result in a larger α error level than the pre-set α level - Family wise error
What is familywise or experimentwise error rate?
This error rate across statistical tests conducted on the same experimental data
Family wise error is related to
type 1 error
What is the alpha level probability
probability of making a wrong decision in accepting the alternate hypothesis = type 1 error
If we conduct 3 separate t-tests to test the comparison of which is the fastest animal in experiment - cats, dogs or rats with alpha level of 0.05 - (4)
- 5% of type 1 error of falsely rejecting H0
- Probability of no. of Type 1 errors is 95% for a single test
- However, for multiple tests the probability of type 1 error decreases as 3 tests together => 0.950.950.95 = 0.857
- This means probability of a type 1 error increases: 1- 0.857 = 0.143 (14.3% of not making a type 1 error)
Much like model for t-tests we can write a general linear model for
ANOVA - 3 levels of categorical variable with dummy variables
When we perform a t-test, we test the hypothesis that the two samples have the same
mean
ANOVA tells us whether three or more means are the same so tests H0 that
all group means are equal
An ANOVA produces an
F statistic or F ratio
The F ratio produced in ANOVA is similar to t-statistic in a way that it compares the
amount of systematic variance in data to the amount of unsystematic variance i.e., ratio of model to its error
ANOVA is an omnibus test which means it tests for and tells us - (2)
overall experimental effect
tells whether experimental manipulation was successful
An ANOVA is omnibus test and its F ratio does not provide specific informaiton about which
groups were affected due to experimental manipulation
Just like t-test can be represented by linear regression equation, ANOVA can be represented by a
multiple regression equation for three means and models acocunt for 3 levels of categorical variable with dummy variables
As compared to independent samples t-test that compares means of two groups, one-way ANOVA compares means of
3 or more independent groups
In one-way ANOVA we use … … to test assumption of equal variances across groups
Levene’s test
What does this one-way ANOVA output show?
Leven’s test is non-significant so equal variances are assumed
What does this SPSS output show in one-way ANOVA?
F(2,42) = 5.94, p = 0.005, eta-squared = 0.22
How is effect size (eta-squared) calculated in one-way ANOVA?
Between groups sum of squares divided by total sum of squares
What is the eta-squared/effect size for this SPSS output and what does this value mean? - (2)
830.207/3763.632 = 0.22
22% of the variance in exam scores is accounted for by the model
Interpreting eta-squared, what does 0.01, 0.06 and 0.14 eta-sqaured in one way ANOVA means? - (3)
- 0.01 = small effect
- 0.06 = medium effect
- 0.14 = large effect
What happens if the Levene’s test is significant in the one-way ANOVA?
then use statistics in Welch or Brown-Forsythe test
The Welch or Brown-Forsythe test make adjustements to DF which affects
in one way ANOVA if Levene’s test is sig
statistics you get and affect if p value is sig or not
What does this post-hoc table of Bonferroni tests show in one-way ANOVA ? - (3)
- Full sleep vs partial sleep, p = 1.00, not sig
- Full sleep vs no sleep , p = 0.007 so sig
- Partial sleep vs no sleep = p = 0.032 so sig
Diagram of example of grand mean
Mean of all scores regardless pp’s condition
What are the total sum of squares (SST) in one-way ANOVA?
difference of the participant’s score from the grand mean squared and summed over all participants
What is model sum of squares (SSM) in one-way ANOVA?
difference of the model score from the grand mean squared and summed over all participants
What is residual sum of squares (SSR) in one-way ANOVA?
difference of the participant’s score from the model score squared and summed over all participants
The residuals sum of squares (SSR) tells us how much of the variation cannot be
explained by the model and amount of variation caused by extraneous factors
We divide each sum of squares by its
DF to calculate them
For SST its DF we divide by is in one-way ANOVA
N-1
For SSM its DF we divide by is one-way ANOVA so
number of group (parameters), k,
For SSM if we have this design then… then its DF will be in one way ANOVA
3-1 = 2
For SSR we divivde by its DF to calculate which will be the in one way ANOVA
total sample size, N, minus the number of groups, k
Formulas of dividing each sum of squares by its DF to calculate it in one way ANOVA- (3)
- MST = SST (N-1)
- MSR = SSR (N-k)
- MSM = SSM/k
SSM tells us the total variation that the
exp manipulation explains
What does MSM represent?
average amount of variation explained by the model (e.g. the systematic variation),
What does MSR represent?
average amount of variation explained by extraneous variables (the unsystematic variation).
The F ratio in one-way ANOVA can be calculated by
If F ratio in one-way ANOVA is less than 1 then it represents a
non-significant effect
Why F less than 1 in one-way ANOVA represents a non-significant effect?
F ratio is less than 1 means that MSR is greater than MSM = more unsystematic than systematic
If F is greater than 1 in one-way ANOVA then shows likelhood … but doesn’t tell us - (2)
indicates that experimental manipulation had some effect above and beyond effect of individual differences in performance
Does not tell us whether F-ratio is large enough to not be a chance result
When F statistic is large in one-way ANOVA then it tells us that the
MSM is greater than MSR
To discover if F statistic is large enough not to be a chance result in one-way ANOVA then
compare the obtained value of F against the maximum value we would expect to get by chance if the group means were equal in an F-distribution with the same degrees
of freedom
High values of F are rare by in one way ANOVA are rare - (3)
by chance
. Low degrees of freedom result in long tails of the distribution, so much like other statistics
large values of F are more common to crop up by chance in studies with low numbers of participants.
The F-ratio tells us in one-way ANOVA whether model fitted to data accounts for more variation thane extraneous and does not tell us where
differences between groups lie
If F-ratio in one-way ANOVA is large enough to be statistically significant then we know
that one or more of the differences between means is statistically significant (e.g. either b2 or b1 i statistically significant)
It is necessary after conducting an one-way ANOVA to carry out further analysis to find out
which groups differ
The power of F statistic is relatively unaffected by
non-normality
when group sizes are not equal the accuracy of F is
affected by skew, and non-normality also affects the power of F in quite unpredictable ways
When group sizes are equal, the F statistic can be quite robust to
violations of normality
What tests do you do after performing a one-way ANOVA and finding significant F test? - (2)
- Planned contrasts
- Post-hoc tests
What do post-hoc tests do? - (2)
- compare all pairwise differences in mean
- Used if no specific hypotheses concerning differences has been made
What is the issue with post-hoc tests?
- because every pairwise combination is considered the type 1 error rate increases, so normally the type 1 error rate is reduced by modifying the critical value of p
Post-hoc tests are like two or one tailed hypothesis?
two-tailed
Planned contrasts are like one or two-tailed hypothesos?
One-tailed hypothesis
What is the most common modification of the critical value for p in post-hoc in one-way ANOVA?
Bonferroni correction, which divides the standard critical value of p=0.05 by the number of pairwise comparisons performed
Planned contrasts are used to investigate a specific
hypothesis
Planned contrasts do not test for every
pairwise difference so are not penalized as heavily as post hoc tests that do test for every difference
With planned contrasts test you dervivie the hypotheses before the
data is collected
In planned contrasts when one condition is used it is
never used again
In planned contrasts the number of independent contrasts you can make can be defined with one way ANOVA
k (number of groups) minus 1
How does planned contrasts work in SPSS?
Coefficients add to 0 for each contrast (-2 + 1 +1) and once group used alone in contrast then enxt contrasts set coefficient to 0 (e.g., -2 to 0)|
Polynominal contrasts can also look at more complex trends other than linear such as in one way ANOVA?
quadratic, cubic and quartic
The Bonferroni post-hoc ensures that the type 1 error is below in one-way ANOVA?
0.05
With Bonferroni correction it reduces type 1 (being conserative in type 1 error for each comparison) it also in one way ANOVA?
lacks statistical power (probability of type II error will be high [ false negative]) so increasing chance of missing a genuine difference in data
What post hoc-tests to use if you have equal sample sizes and confident that your group variances are similar? in one way ANOVA
Use REGWQ or Tukey as good power and tight control over Type 1 error rate
What post hoc tests to use if your sample sizes are slightly different in one way ANOVA?
Gabriel’s procedure because it has greater power,
What post-hoc tests to use if your sample sizes are very different? ine one way ANOVA?
if sample sizes are very different use
Hochberg’s GT2
What post-hoc test to run if Levene’s test of homeogenity of variance is significant in one way ANOVA?
Games-Howell
**
What post=hoc test to use if you want gurantee control over type 1 errror rate in one wau ANOVA?
Bonferroni
What does this ANOVA error line graph show? - (2)
- Linear trend as dose of Viagra increases so does mean level of libido
- Error bars overlap indicating no between group differences
What does the within groups gives deails of in ANOVA table?
SSR (unsystematci variation)
The between groups label in ANOVA table tells us
SSM (systematic variation)
What does this ANOVA table demonstrate? - (2)
- Linear trend is significant (p = 0.008)
- Quadratic trend is not significant (p = 0.612)
When we do planned contrasts we arrange the weights in such that we compare any group with a positive weight
with a negative weight
What does this output show if we conduct two planned comparisons of:
one to test whether the control group was different to the two groups which received Viagra, and one to see
whether the two doses of Viagra made a difference to libido
- (2)
the table of weights shows that contrast 1 compares the placebo group against the two experimental groups,
contrast 2 compares the low-dose group to the high-dose group
What does this table show if levene’s test is non significant =equal variances assumed
To test hypothesis that experimental groups would increase libido above the levels seen in the placebo group (one-tailed)
To test another hypothesis that a high dose of Viagra would increase libido significantly more than a low dose
one-way ANOVA
- (3)
Signifiance value given in table is two-tailed and since hypothesis one-tail we divide by 2
for contrast 1, we can say that taking Viagra significantly increased libido compared to the control group (p = .0029/2 = 0.0145)
. The significance of contrast 2 tells us that a high dose of Viagra increased libido significantly more than a low dose (p(one-tailed) = .065/2 = .0325)
If making a few pairwise comparisons and equal umber of pps in each condition then … if making a lot then use. in one way ANOVA - (2).
Bonferroni
Tukey
Assumptions of ANOVA - (5)
- Independence of data
- DV is continuous; IV categorical (3 groups)
- No significant outliers;
- DV approximately normally distributed for each category of the IV
- Homogenity of variance = Levene’s test not significant
ANOVA compares many means without increasing the chance of
type 1 error
In one-way ANOVA, we partiton the total variance into
IV and DV
An independent t-test is used to test for:
A Differences between means of groups containing different participants when the sampling distribution is normal, the groups have equal variances and data are at least interval.
B Differences between means of groups containing different participants when the data are not normally distributed or have unequal variances.
C Differences between means of groups containing the same participants when the data are normally distributed, have equal variances and data are at least interval.
D Differences between means of groups containing the same participants when the sampling distribution is not normally distributed and the data do not have unequal variances.
A differences between means of groups containing different participants when sampling distribution is normal and the groups have equal variances and data are at least interva
If you use a piared samples t-test
A The same participants take part in both experimental conditions.
BThere ought to be less unsystematic variance compared to the independent t-test.
C Other things being equal, you do not need as many participants as you would for an independent samples design.
D All of these are correct.
D All of these are correct
Which of the following statements about the t distribution is correct?
A It is skewed
BIn small samples it is narrower than the normal distribution
CAs the degrees of freedom increase, the distribution becomes closer to normal
DIt follows an exponential curve
C As the DF increase, the distribution becomes closer to normal
Which of the following sentences is an accurate description of the standard error?
AIt is the same as the standard deviation
BIt is the observed difference between sample means minus the expected difference between population means (if the null hypothesis is true)
CIt is the standard deviation of the sampling distribution of a statistic
D It is the standard deviation squared
CIt is the standard deviation of the sampling distribution of a statistic
A psychologist was interested in whether there was a gender difference in the use of email. She hypothesized that because women are generally better communicators than men, they would spend longer using email than their male counterparts. To test this hypothesis, the researcher sat by the computers in her research methods laboratory and when someone started using email, she noted whether they were male or female and then timed how long they spent using email (in minutes). Based on the output, what should she report?
(NOTE: Check for the assumption of equality of variances).
A Females spent significantly longer using email than males, t(14) = –1.90, p = .079
BFemales and males did not significantly differ in the time spent using email,t(7.18) = –1.90,p= .099
CFemales and males did not significantly differ in the time spent using email, t(7.18) = –1.90, p < .003
DFemales and males did not significantly differ in the time spent using email, t(14) = –1.90, p = .079
BFemales and males did not significantly differ in the time spent using email,t(7.18) = –1.90,p= .099
Other things being equal, compared to the paired-samples (or dependent)t-test, the independentt-test:
A Has more power to find an effect.
BHas the same amount of power, the data are just collected differently.
CHas less power to find an effect.
D Is less robust.
CHas less power to find an effect.
Differences between group means can be characterized as a regression (linear) model if:
AThe outcome variable is categorical.
BThe groups have equal sample size.
CThe experimental groups are represented by a binary variable (i.e. code 1 and 0).
DThe difference between group means cannot be characterized as a llinear model, they must be analyzed as an independent t-test.
The experimental groups are represented by a binary variable (i.e. code 1 and 0)
An experiment was done to look at whether different relaxation techniques could predict sleep quality better than nothing. A sample of 400 participants were randomly allocated to one of four groups: massage, hot bath, reading or nothing. For one month each participant received one of these relaxation techniques for 30 minutes before going to bed each night. A special device was attached to the participant’s wrist that recorded their quality of sleep, providing them with a score out of 100. The outcome was the average quality of sleep score over the course of the month.
Which test could we use to analyse these data?
A Regression only
B ANOVA only
C Regression or ANOVA
D Chi-square
C (multiple) Regression or ANOVA (independent) as regression and ANOVA is the same
Did not mention the hypothesis of prediction or it would be regression
Chi-square only used when you have one categorical predictor and outcome is categorical
A researcher testing the effects of two treatments for anxiety computed a 95% confidence interval for the difference between the mean of treatment 1 and the mean of treatment 2. If this confidence interval includes the value of zero, then she cannot conclude that there is a significant difference in the treatment means: true or false.
TRUE OR FALSE
TRUE
The student welfare office was interested in trying to enhance students’ exam performance by investigating the effects of various interventions. They took five groups of students before their statistics exams and gave them one of five interventions: (1) a control group just sat in a room contemplating the task ahead; (2) the second group had a yoga class to relax them; (3) the third group were told they would get monetary rewards contingent upon the grade they received in the exam; (4) the fourth group were given beta-blockers to calm their nerves; and (5) the fifth group were encouraged to sit around winding each other up about how much revision they had/hadn’t done (a bit like what usually happens). The final percentage obtained in the exam was the dependent variable. Using the critical values for F, how would you report the result in the table below?
AType of intervention did not have a significant effect on levels of exam performance, F(4, 29) = 12.43, p > .05.
BType of intervention had a significant effect on levels of exam performance, F(4, 29) = 12.43, p < .01.
CType of intervention did not have a significant effect on levels of exam performance, F(4, 33) = 12.43, p > .01.
DType of intervention had a significant effect on levels of exam performance, F(4, 33) = 12.43, p < .01.
Type of intervention had a significant effect on levels of exam performance, F(4, 29) = 12.43, p < .01.
Imagine you compare the effectiveness of four different types of stimulant to keep you awake while revising statistics using a one-way ANOVA. The null hypothesis would be that all four treatments have the same effect on the mean time kept awake. How would you interpret the alternative hypothesis?
A. All four stimulants have different effects on the mean time spent awake
B, All stimulants will increase mean time spent awake compared to taking nothing
C. At least two of the stimulants will have different effects on the mean time spent awake
D, None of the above
C. At least two of the stimulants will have different effects on the mean time spent awake
When the between-groups variance is a lot larger than the within-groups variance, the F-value is ____ and the likelihood of such a result occurring because of sampling error is _____
A small; high
B small; low
C. large; high
D. large; low
D. large; low
Subsequent to obtaining a significant result from an exploratory one-way independent ANOVA, a researcher decided to conduct three post hoc t-tests to investigate where the differences between groups lie.
Which of the following statements is correct?
A. The researcher should accept as statistically significant tests with a probability value of less than 0.016 to avoid making a Type I error
B. The researcher should have conducted orthogonal contrasts instead of t-tests to avoid making a Type I error
C. This is the wrong method to use. The researcher did not make any predictions about which groups will differ before running the experiment, therefore contrasts and post hoc tests cannot be used
D. None of these options are correct
The researcher should accept as statistically significant tests with a probability value of less than 0.016 to avoid making a Type I error
A psychologist was looking at the effects of an intervention on depression levels. Three groups were used: waiting list control, treatment and post-treatment (a group who had had the treatment 6 months before). The SPSS output is below. Based on this output, what should the researcher report?
A. The treatment groups had a significant effect on depression levels,F(2, 45) = 5.11.
B. The treatment groups did not have a significant effect on the change in depression levels,F(2, 35.10) = 5.11.
C. The treatment groups did not have a significant effect on depression levels,F(2, 26.44) = 4.35.
D. The treatment groups had a significant effect on the depression levels,F(2, 26.44) = 4.35.
D. The treatment groups had a significant effect on the depression levels,F(2, 26.44) = 4.35.
Imagine we conduct a one-way independent ANOVA with four levels on our independent variable and obtain a significant result. Given that we had equal sample sizes, we did not make any predictions about which groups would differ before the experiment and we want guaranteed control over the Type I error rate, which would be the best test to investigate which groups differ?
A. Orthogonal contrasts
B. Helmert
C. Bonferroni
D. Hochberg’s GT2
C. Bonferroni
The student welfare office was interested in trying to enhance students’ exam performance by investigating the effects of various interventions.
They took five groups of students before their statistics exams and gave them one of five interventions: (1) a control group just sat in a room contemplating the task ahead (Control); (2) the second group had a yoga class to relax them (Yoga); (3) the third group were told they would get monetary rewards contingent upon the grade they received in the exam (Bribes); (4) the fourth group were given beta-blockers to calm their nerves (Beta-Blockers); and (5) the fifth group were encouraged to sit around winding each other up about how much revision they had/hadn’t done (You’re all going to fail).
The student welfare office made four predictions: (1) all interventions should be different from the control; (2) yoga, bribery and beta-blockers should lead to higher exam scores than panic; (3) yoga and bribery should have different effects than the beta-blocker drugs; and (4) yoga and bribery should also differ.
Which of the following planned contrasts (with the appropriate group codings) are correct to test these hypotheses?
ANSWER 1
ANSWER 2
ANSWER 3
ANSWER 4
ANSWER 1 - sum of all weights should be 0
Deciding what post hoc tests to run
Example of RQ for one way ANOVA - (3)
Is there a statistically significant difference in Frisbee throwing distance with respect to education status
IV = Education with 3 levels = high school, graduate, postgrad
DV = Frisbee throwing distance
What does this one-way ANOVA output show?
Research question: Is there a statistically significant difference in Frisbee throwing distance with respect to education status?
Variables:
IV - Education, which has three levels:
High School, Graduate and PostGrad;
DV - Frisbee Throwing Distance
There was homogeneity of variance as assessed by Levene’s Test for Equality of Variances (F (2,47) = 1.94, p = .155)
What does the results of one-way ANOVA show?
Research question: Is there a statistically significant difference in Frisbee throwing distance with respect to education status?
Variables:
IV - Education, which has three levels:
High School, Graduate and PostGrad;
DV - Frisbee Throwing Distance
There was a statistically significant difference between groups as demonstrated by one-way ANOVA (F(2, 47) = 3.50, p = .038).
What does the results of one-way ANOVA show? –> post hoc
Research question: Is there a statistically significant difference in Frisbee throwing distance with respect to education status?
Variables:
IV - Education, which has three levels:
High School, Graduate and PostGrad;
DV - Frisbee Throwing Distance
A Tukey post hoc test shows that the PostGrad group was able to throw the frisbee statistically significantly further than the High School group (p = .034). There was no statistically significant difference between the Graduate and High School groups (p = . 691) nor between the Graduate and PostGrad groups (p = .099).
What is IV and DV of one -way ANOVA?
IV = 1 predicto Categorical with more than 2 levels
DV = 1 Continous
one-way ANOVA is also called
between subject
regression equation for ANOVA
can be extended to include one or more continuous variables that predict the outcome (or dependent variable).
these continous variables are not
part of the main experimental manipulation but have an influence on the dependent variable, are known as covariates and they can be included
in an ANOVA analysis.
What does ANCOVA involve?
When we measure covariates and include them in an
analysis of variance
Continuous variables, that are not part of the main experimental manipulation (don’t want to study them) but have an influence on the dependent variable, are known as
covariates
From what we know from hierarchical regression model, if we enter covariate into regression model first then dummy variables representing exp manipulation after… - (2)
then we can see what effect an IV has after the effect of covariate
We partial out the effect of covariate
What are the two reasons for including covariates in ANOVA? - (2)
- To reduce within-group error variance = if we can explain unexplained variance , SSR, in terms of other variables (covariates)then reduce SSR to accurately assess effects of SSM
- Elimination of confoundd = remove bias of unmeasured variables that confound results and influence DV
ANCOVOA has same assumptions of ANOVA, e.g., normality and homogenity of variance (Levene’s test) expect has two more important assumptions which are… - (2)
- Independence of the covariate and treatment effect
- Homogeneity of regression slopes
For ANCOVA to reduce within-group variance by allowing the covariate to explain some of the error variance the covariate must be
independent from the experimental/treatment effect - (IVs - categorical predictors) ( ANCOVA assumption)
People should no use ANCOVA when the effect of covariate overlaps with the experimental effect as it means the
experimental effect is confounded with the effect of covariate = interpretation of ANCOVA is compromised
In ANCOVA, the effect of the covariate should be independent of the
experimental effect
When an ANCOVA is conducted we look at the overall relationship between DV and covariate meaning we fit a regression line to
entire dataset and ignore which groups pps fit in
When is homogenity of regression slope is not satisifed in ANCOVA?
the relationship between the
outcome (dependent variable) and covariate differs across the groups then the overall regression model is inaccurate (it does not represent all of the groups).
What is best way to test homeogenity of regression slopes assumption in ANCOVA?
imagine plotting a scatterplot for each experimental condition with the covariate on one axis and the outcome on the other and calculate its regression line
Diagram of regression slopes satisfying homogenity of regression slopes in ANCOVA
- exhibits the same slopes for control and 15 minute group
Diagram of a regression slopes not satisfying homogenity of regression slopes in ANOCVA
- 30 minutes of therapy exhibts a different slope compared to others
What is design, variables and test would you use to test this researh scenario? - (5)
- ANCOVA
- Independent samples-design
- One IV , two conditions, interval regime and steady state
- One covariate (age in years)
- One DV (Race time)
What does this ANCOVA output show?
- IV = Regime –> steady or interval
- Covariate = Age
- DV = Racetime- (2)
- Age F(1,27) = 5.36, p = 0.028, partial eta-squared = 0.17 (large and sig main effect)
- Regime F(1,27) = 4.28, p = 0.048, partial eta-squared = 0.14 (large and sig main effect)
What DF do you report from this ANCOVA table for age for example…
DF for age and DF for error
Guidelines for interpreting partial eta-squared - (3)
η2 = 0.01 indicates a small effect.
η2 = 0.06 indicates a medium effect.
η2 = 0.14 indicates a large effect
What does this SPSS output for ANCOVA show? - (3)
- Interval has a marginal mean of race times of 56.57
- Steady state has a marginal mean of race times 62.97
- Estimated marginal means partialled out the effects of age and view mean scores of race times in interval and steady state if mean age scores (30.07) across two groups was held constant
What does this output show in terms of homogenity of regression slopes?
age is covariate and regime is IV and DV is race times - (2)
- Interaction effect of regime * age has a p-value of 0.980
- Since p-value is not significant the assumption of homogeneity of regression slopes has been met
What happens if the interaction effect of IV and covariate is significant in testing homogenity of regression slopes
relationship between covariate and DV differ significantly between two groups or many groups you got and assumption is not satisfied
For testing assumption of independence of covariate and experimental effect (IV) in SPSS, we need to add
IV (e.g., regime) and covariate (e.g., age) in DV instead of covariate box
What does this SPSS output show in terms of independence of covariate and exp effect (IV)?
age is covariate (treated as DV) , regime is IV - (2)
- P-value is not signifcant (p=0.528) so effect of variable age is not sig difference of age across training regime
- and so independent variable are assumed to be independent.
What does positive and negative b-value for covariates in ANCOVA parameter estimate box indicate? - (2)
f the b-value for the covariate is positive then it means that the covariate and the outcome variable have a positive relationship
If the b-value is negative it means the opposite: that the covariate and the outcome variable have a negative relationship
What does this table of parameter estimates show for ANCOVA where..
DV = PP’sLibido, IV = Dose of Viagara, Covariate is Partner’sLibido - (3)
- b for covariate is 0.416
- Besides other things being equal, if a a partner’s libido increases by one unit, then the person’s libido should
increase by just 0.416 units - Since b is positive then partner’s libido ahs pos relation with pps’s libido
How is DF calculated for these t-tests in ANCOVA table? - (2)
N - p -1
N is total sample size, p is number of predictors (2 dummy variables and covariate )
What post-hoc tests can you do with ANCOVA? - (3)
- Tukey LSD with no adjustments (not reccomended)
- Bonferroni correction (reccomended)
- Sidak correction
The sidak correction is similar to what correction?
Bonferroni correction
Sidak correction is less conserative than
Bonferroni correction
The Sidak correction should be selected if you are concerned about
loss of power associated with Bonferroni corrected values.
What does these planned contrast results show in ANCOVA?
DV = Pp’s Libido, IV = Dose of Viagara, Covariate is Partner’s Libido -
IV Dose: Level 3 = high dose, level 2 = low dose, level 1 = placebo
(3)
- Contrast 1 of comparing level 2 (low dose) against level 1 (placebo) is significant (p = 0.045)
- Contrast 2 of comparing level 3 (high dose) with level 1 (placebo) is significant (p - 0.010)
What does this Sidak correction post-hoc comparison in ANCOVA output show?
DV = Libido, IV = Dose of Viagara, Covariate is Libido -
IV Dose: Level 3 = high dose, level 2 = low dose, level 1 = placebo
- (3)
- The significant difference between the high-dose and placebo groups remains (p = .030)
- high-dose and low-dose groups do not significantly differ (p = .93)
- Low dose and placebo groups do not significantly differ (p value = 0.130)
What do these scatterplot of regression lines show in terms of homogenity of regression slopes?
DV = Libido, IV = Dose of Viagara, Covariate is Libido -
IV Dose: Level 3 = high dose, level 2 = low dose, level 1 = placebo
(3)
For placebo and low dose there appears to be a positive relationship between pp’s libido and that of their partner
However, in the high-dose condition there appears to be no relationship at all between participant’s libido and that of their partner - shows negative relationship
Doubts whether homogenity of regression slopes is satisfied as not all the slopes are the same (go same direction)
What effect sizes can we use for ANCOVA/ANOVA? - (4)
- eta-squared
- partial-eta squared (ANCOVA)
- omega squared = used when equal numbe of pps in each grp
- r
How is eta-squared calcuated?
Dividing the effect of interest SSM by total variance in the data SST
How is partial eta-squared calculated for ANCOVA??
SS Effect/ SS Effect + SS Residual
What is the difference between partial and eta-squared?
This differs from eta squared in that it looks
not at the proportion of total variance that a variable explains, but at the proportion of variance that a variable explains that is not explained by other variables in the analysis
What test is used to investigate this question and how is it conducted? - (2)
We want to know whether or not studying technique (3 levels) has an impact on exam scores,but we want to account for the grade that the student already has in the class.
- ANCOVA
- ANCOVA is conducted to determine i f there is a statistically significant difference between different studying techniques (IV) on exam score (DV) after controlling for current grade (covariate)
In ANCOVA, we partion the total variance into
IV, DV and covariate
In ANCOVA, examine influence of categorical IVs on DV while removing the effect of
covariate factor(s)
In ANCOVA, the covariate correlates with the … but not the ..
correlates with outcome DV but not with IV
What is an example of covariate?
baseline pre-test scores can be used as a covariate to control for inital grp differences on test performance
In ANCOVA, the IVS, Covariates and DVs are.. - (2)
- IVs are categorical
- Covariates are metric (quantiatively) independent of IV
- DV is metric
In ANCOVA, you have - (2)
1 DV: Continous
2 predictor variables with 2 levels or more that are categorical and continous
What is example of continous? - (3)
infinite number of possible values variables can take on
e.g., interval = equal intervals on variable represent equal difference measured like diff between 600ms and 800ms is = difference between 1300ms and 1500ms
e.g., ratio = same as interval but clear definition of 0 like height or weight
What is example of categorical variable? - (3)
A variable that cannot take on all values within the limits of the variable - entities are divided into distinct categories
e.g., nominal = 2 or more caegories e.g., whether someone is vegan or vegetarian
e.g., ordinal categories have order like people who got fail, pass, merit or distinction
What does independence of covariate mean in ANCOVA?
Independence of the covariate and treatment effect means that the categorical predictors and the covariate should not be dependent on each other
What does homogenity of regression slopes mean in ANCOVA?
Homogeneity of regression slopes means that the covariate has a similar relationship with the outcome measure, irrespective of the level of the categorical variable - in this case the group
For homogeneity of regression slopes in ANCOVA, there are
There are alternative, a bit more advanced, methods to account for such differences as they are not, in general, uninteresting, but for the ANCOVA analysis they do present an issue
In ANCOVA, between subject effects we quote DF such as for dose as…
Quote df for the effect and error, e.g. 2,26
In ANCOVA, adjusted means table in SPSS shows.. - (2)
outcome/DV = happiness measure ranging from 0 to 10 (as happy as I can image) = continous = interval
The fixed factor (IV) which is dose of therapy which is people have 15 minutes of puppy therapy or 30 minutes
Covariate is control group is how much they love puppies = continous = interval
The group means can be recalculated once the effect of the covariate is ‘discounted’ = impact of covariate is taken into account and adjusted into each level of predictor variable in mean column
These values can differ markedly from the original group means and help with interpretation.
ANCOVA is extension of ANOVA as - (2)
- Control for Covariances (continuous variables you may not necessarily want to measure)
- Study combinations of categorical and continuous variables – covariate becomes the variable of interest rather than the one you control
What ANCOVA was conducted?
We want to know whether or not studying technique has an impact on exam scores,but we want to account for the grade that the student already has in the class.
A three-way ANCOVA was conducted to determine a statistically significant difference between different study techniques on students exam scores after controlling for their current grades.
Assumptions of ANCOVA - (8)
Independent variablesshould becategorical variables.
Thedependent variableand covariate should be continuous variables(measured on aninterval scaleorratio scale.)
Make sureobservations are independent - don’t put people into more than one group.
Normality: the dependent variable should be roughlynormalfor each of category ofindependent variables.
Data (and regression slopes) should showhomogeneity of variance.
The covariate and dependent variable (at eachlevelof independent variable) should belinearly related.
Your data should behomoscedastic
The covariate and theindependent variableshouldn’t interact.In other words, there should be homogeneity of regression slopes.
In one-way ANOVA we partition the total variance into
IV and DV
A psychologist was interested in the effects of different fear information on children’s beliefs about an animal. Three groups of children were shown a picture of an animal that they had never seen before (a quoll). Then one group was told a negative story (in which the quoll is described as a vicious, disease-ridden bundle of nastiness that eats children’s brains), one group a positive story (in which the quoll is described as a harmless, docile creature who likes nothing more than to be stroked), and a final group weren’t told a story at all. After the story children rated how scared they would be if they met a quoll, on a scale ranging from 1 (not at all scared) to 5 (very scared indeed). To account for the natural anxiousness of each child, a questionnaire measure of trait anxiety was given to the children and used in the analysis
what analysis has been used -
Independent analysis of variance
Repeated-measures analysis of variance
Mixed analysis of variance
Analysis of covariance
Analysis of covariance (ANCOVA)
A psychologist was interested in the effects of different fear information on children’s beliefs about an animal. Three groups of children were shown a picture of an animal that they had never seen before (a quoll). Then one group was told a negative story (in which the quoll is described as a vicious, disease-ridden bundle of nastiness that eats children’s brains), one group a positive story (in which the quoll is described as a harmless, docile creature who likes nothing more than to be stroked), and a final group weren’t told a story at all. After the story children rated how scared they would be if they met a quoll, on a scale ranging from 1 (not at all scared) to 5 (very scared indeed). To account for the natural anxiousness of each child, a questionnaire measure of trait anxiety was given to the children and used in the analysis
what is covariate?
Natural Fear Level
Which of the designs below would be best suited for ANCOVA?
A. Participants were randomly allocated to one of two stress management therapy groups, or a waiting list control group. Their levels of stress were measured and compared after 3 months of weekly therapy sessions.
B. Participants were allocated to one of two stress management therapy groups, or a waiting list control group based on their baseline levels of stress. The researcher was interested in investigating whether stress after the therapy was successful partialling out their baseline anxiety.
C. Participants were randomly allocated to one of two stress management therapy groups, or a waiting list control group. The researcher was interested in the relationship between the therapist’s ratings of improvement and stress levels over a 3-month treatment period.
D.Participants were randomly allocated to one of two stress management therapy groups, or a waiting list control group. Their baseline levels of stress were measured before treatment, and again after 3 months of weekly therapy sessions.
(2)
D since baseline levels of stress used as covariate and use this as a control when looking at impact treatment has had over 3 month assessment
Not B since grps allocated based on baseline levels of stress (covariate and IV correlated - problematic) and A and C is one-way independent ANOVA
A psychologist was interested in finding a cure for hangovers. She took 50 people out on the town one night and got them drunk. The next morning, she allocated them to either a control condition (drink water only) or an experimental hangover cure condition (a beetroot, raw egg and chilli smoothie). This is the variable ‘Group’. Two hours later she then measured how well they felt on a scale from 0 (‘I feel fine’) to 10 (‘I am about to die’)(Variable = Hangover).
She also realized she ought to ask them how drunk they were the night before and control for this in the analysis, so she measured this on another scale of 0 (‘sober’) to 10 (‘very drunk’) (Variable = Drunk). The psychologist hypothesised that the smoothie drink would lead to participants feeling better, after having accounted for the previous night’s drunkenness.
What test?
ANCOVA
A psychologist was interested in finding a cure for hangovers. She took 50 people out on the town one night and got them drunk. The next morning, she allocated them to either a control condition (drink water only) or an experimental hangover cure condition (a beetroot, raw egg and chilli smoothie). This is the variable ‘Group’. Two hours later she then measured how well they felt on a scale from 0 (‘I feel fine’) to 10 (‘I am about to die’)(Variable = Hangover).
She also realized she ought to ask them how drunk they were the night before and control for this in the analysis, so she measured this on another scale of 0 (‘sober’) to 10 (‘very drunk’) (Variable = Drunk). The psychologist hypothesised that the smoothie drink would lead to participants feeling better, after having accounted for the previous night’s drunkenness.
Identify IV (fixed), DV and covariate - (3)
- IV: Group
- DV: Hangover
- Covariate: Drunk
What is the decision tree of choosing a two-way independent ANOVA? - (5)
Q: What sort of measurement? A: Continuous
Q:How many predictor variables? A: Two or more
Q: What type of predictor variable? A: Categorical
Q: How many levels of the categorical predictor? A: Not relevant
Q: Same or Different participants for each predictor level? A: Different
Partial eta-squared should be reported for
ANOVA and ANCOVA
What is the two drawbacks of eta-squared?
as you add more variables to the model, the proportion explained by any one variable will automatically decrease.
How is eta-squared calculated?
Sum of squares between (squares of effect M) divided by sum of squared total (squares of everything - effects, errors and interactions)
In one-way ANOVA eta-squared and partial-eta squared will be eequal but not true in models with
more than one IV
Two-way Independent ANOVA is also called an
Independent Factorial ANOVA
What is a factorial design?
When experiment has two or more IVs
What are the 3 types of factorial design? - (3)
- Independent factorial design
- Repeated-measures (related) factorial design
- Mixed design
What is independent factorial design?
- There is many IVs or predictors that each have been measured using different pps (between grps)
What is repeated-measures (related) factorial design?
- Many IVs or predictors have been measured but same pps used in all conditions
What is mixed design?
- Many IVs or predictors have been measured; some measured with diff pps whereas others used same pps
Which design does independent factorial ANOVA use?
Independent factorial design
What is factorial ANOVA?
When we use ANOVA to analyse a situation in which there is two or more IVs
What is difference between one way and two way ANOVA?
A one-way ANOVA has one independent variable, while a two-way ANOVA has two.
Example of two-way independent factorial ANOVA
The study tested the prediction that subjective perceptions of physical attractiveness become inaccurate after drinking alcohol which is IV, DVs
- What are the IVs, DVs- (3)
IV = Alcohol - 3 levels = Placebo, Low dose, High dose
Iv = face type 2 levels = unattractive, attractive
DV = Physical attractiveness score
Two way independent ANOVA can be fit into the idea of
linear model
The study tested the prediction that subjective perceptions of physical attractiveness become inaccurate after drinking alcohol
IV = Alcohol - 3 levels = Placebo, Low dose, High dose
Iv = face type 2 levels = unattractive, attractive
DV = Physical attractiveness score
Create a linear model for this two-way ANVOA scenario which adds interaction term and explain why is it important - (3)
- The first equation models the two predictors in a way that allows them to account for variance in the outcome separately, much like a multiple regression model
- The second equation adds a term that models how the two predictor variables interact with each other to account for variance in the outcome that neither predictor can account for alone.
- The interaction is important to us because it tests our hypothesis that alcohol will have a stronger effect on the ratings of unattractive than attractive faces
How do we know coefficients in model are significant in two-way ANCOVA?
We follow the same routine , similar to one-way ANOVA, to compute sums of squares for each factor of the model (and their interaction) and compare them to the residual sum of squares, which measures what the model cannot explain
How is two-way independent ANOVA similar to one-way ANOVA?
, we still find the total sum of squared errors (SST) and break this variance down into variance that can be explained by the experiment (SSM) and variance that cannot be explained (SSR).
How is two-way INDEPENDENT ANOVA different to one-way INDEPENDENT ANOVA? - (3)
in two-way ANOVA, the variance explained by the experiment is made up of not one experimental manipulation but two.
Therefore, we break the model sum of squares down
into variance explained by the first independent variable (SSA), variance explained by the second independent variable (SSB) and variance explained by the interaction of these two
variables (SSA × B)
How to calculate total sum of squares SST in two-way independent ANOVA?
What is SST DF in two-way independent ANOVA?
N- 1
How to compute model sum of squares SSM in two-way independent ANOVA? - (2)
sum of all grps (pairing each level of IV with another)
n = number of scores in each grp which is multipled by the mean value of each group subtracted by grand mean of all pps regardless of grp squared
How to compute degrees of freedom of SSM in two-way independent ANOVA?
(g-1)
How many groups are there in this research two-way independent ANOVA?
IV = Alcohol - 3 levels = Placebo, Low dose, High dose
Iv = face type 2 levels = unattractive, attractive
DV = Physical attractiveness score
placebo + attractiveness
placebo + untractiveness
low dose +attractiveness
low dose + unattractiveness
high dose +attractiveness
high dose +unattractiveness - 6 grps
How is SSA (face type) computed in two-way independent ANOVA?
IV = Alcohol - 3 levels = Placebo, Low dose, High dose
Iv = face type 2 levels = unattractive, attractive
DV = Physical attractiveness score - (2)
considering only two groups at a time and add together - for first IV variable (SSA) (e.g., grps of pps rated attractive and grp of pps that rated unattractive)
number of pps in that grp multiplied by mean of grp subtracted by grand mean overall of all pps squared
What is the degrees of freedom in SSA in TWO-WAY INDEPENDENT ANOVA?
DF = (g-1) so if male and female then 2 -1 = 1
How to compute SSB in two-way independent ANOVA for alcohol type
IV = Alcohol - 3 levels = Placebo, Low dose, High dose
Iv = face type 2 levels = unattractive, attractive
DV = Physical attractiveness score - (2) - (3)
same formula as SSA but for the second IV
added for all grps of pps in second IV
number of pps in one grp of secondIV(mean score of that grp subtract by grand mean of all pps regardless of grp) squared
What is DF for SSB in two-way independent ANOVA?
number of grps in second IV minus 1
SS A X B in two-way independent ANOVA is calculating how much variance is explaiend
by the interaction of 2 variables
How is SS A X B (interaction term) calculated in two-way ANOVA?
SS A X B = SSM - SSA - SSB
How is SS A X B’S DF calculated in two-way independent ANOVA?
df A X B = df M - df A - df B
The SSR in two-way independent ANOVA, is similar to one-way ANOVA as it represents the
individual differences in performance or the variance that can’t be explained by factors that were systematically manipulated.
How to calculate SSR in two-way independent ANOVA?
- use individual variances of each grp (e.g., attractiveness face type + placebo) and multiply by one less than number of people within the group (n - in this case 6) and do it for each group and add it together
How to calculate SSR in two-way independent ANOVA?
number of grps you have in study(number of scores you have per group minus 1)
Diagram of calculating mean sums of squares in two-way ANOVA independent
What effect sizes can we calculate with two-way independent ANOVA? - (2)
- Partial eta-squared
- Omega-squared if advised
What to do whe assumptions are violated in factorial independent ANOVA? - (3)
- There is not a simple non-parametric counterpart of factorial ANOVA
- If assumption of normality is violated then use robust methods described in Wilcox’s and files in R
- If assumptions of homogenity of variance then implement corrections based on Welch procedure
Example of a research scenario of two-way independent ANCOVA
Pick out IVs and DVs - (4)
- Independent samples design
- Two Ivs, both 2 conditions: drug type (A, B) and onset (early, late)
- One DV is cognitive performance
- Two way ANOVA
What does this two-way ANOVA independent design SPSS output show?
- The levene’s test is not significant so assume equal variances
What happens if Levene’s test is significant in two-way independent ANOVA?
steps taken to equalise variances through data transformation
What does this two-way independent ANOVA table show - (4)
- Drug : F(1,24) = 5.58, p = 0.027, partial eta-squared = 0.19 (large effect + sig effect)
- Onset: F(1,24) = 14.43, p = 0.001, partial eta-squared = 0.38 (large effect + sig effect)
- Interaction Drug * Onset: F(1,24) = 9.40, p = 0.005, partial eta-squared = 0.28 (large effect + sig effect)
- We got two sig main effects and sig interaction effect which are all quite large effect sizes
What does this SPSS output show for two-way independent ANOVA? - (3)
drug B has higher score on cognitive test than A and is sig main effect (CI does not contain 0 and also main effect analysis)
early onset scoring higher on average than late onset (CI does not contain 0 and also main effect analysis)
Important of these main effect as main effects ignoring the effec tof other IV so results for drug at top is regardless of whether late/onset for example , does not tell anything for interaction
What does this interaction plot show TWO WAY ANOVA? - (6)
- Blue line is early onset
- Green line is late onset
- For late onset, drug B lead to higher mean scores on test than drug A
- For early onset, drug A led to slightly higher mean scores than drug B
- Drug A more effective then drug b for early onset but different marginal
- Drug B was substantially more effective than Drug A for late
Non-parallel lines in interaction plot indicate an
sig interaction effect
We can follow interactions in two-way ANOVA with simple effects analysis which - (2)
- looks at the effect of one IV at individuals levels of other IV
- Seeing whether differences margina/substantial is sig
The SSM in two-way independent ANOVA is broken down into three components:
variance explained by the first independent
variable (SSA), variance explained by the second independent variable (SSB ) and variance explained by the interaction of these two variables (SSA × B
).
Example of difference of one-way ANOVA vs two-way ANOVA (independent) - (2)
- One-way ANOVA have one IV categorical variable (level of educaiton - college degree, grad degree, high school)
- Two-way ANOVA , you have 2 categorical IV variables - level of education (college degree, grad degree, high school) and zodaic sign (libra, pisces)
In two-way independent ANOVA, you need how many DV and IV?
1 DV and 2 or more categorical predictors
What test is used for this scenario?
A psychologist wanted to test a new type of drug treatment for ADHD called RitaloutTM. The makers of this drug claimed that it improved concentration without the side effects of the current leading brand of ADHD medication.
To test this, the psychologist allocated children with ADHD to two experimental groups, one group took RitaloutTM(New drug), the other took the current leading brand of medication (Old drug) (Variable = Drug).
To test the drugs’ effectiveness, concentration was measured using the Parker-Stone Concentration Scale, which ranges from 0 (low concentration) to 12 (high concentration) (Variable = Concentration).
In addition, the psychologist was interested in whether the effectiveness of the drug would be affected by whether children had ‘inattentive type’ ADHD or ‘hyperactive type’ ADHD (Variable = ADHD subtype).
Two-way independent ANOVA
A researcher was interested in measuring the effect of 3 different anxiety medications on patients diagnosed with anxiety disorder. They measured anxiety levels before and after treatment of 3 different treatment groups plus a control group. The researchers also collected data on depression levels.
Identify the IV, DV, and covariates! - and design (3)
IV = 3 different types anxiety medications and control grp
DV: Anxiety levels after treatment of grps
Covariate = anxiety before treatment, depression levels
ANCOVA
Researchers wanted to see how much people of different education levels are interested in politics. They also believed that there might be an effect of gender. They measured political interest with a questionnaire in males and females that had either school, college or university education.
Identify the IVs and DV and design - (3)
- IV: Level of education - school, college or uni edu and gender (m, f)
- DV: Political interest in questionnaire
- Two-way independent ANOVA
An experiment was done to look at whether there is an effect of both gender and the number of hours spent practising a musical instrument on the level of musical ability.
A sample of 30 participants (15 men and 15 women) who had never learnt to play a musical instrument before were recruited. Participants were randomly allocated to one of three groups that varied in the number of hours they would spend practising every day for 1 year (0 hours, 1 hours, 2 hours). Men and women were divided equally across groups.
All participants had a one-hour lesson each week over the course of the year, after which their level of musical skill was measured on a 10-point scale ranging from 0 (you can’t play for toffee) to 10 (‘Are you Mozart reincarnated?’).
Identify IVs and DV and design - (3)
- IV: Gender (m,f) , number of hrs spent practicisng
- DV: Level of muscial skill after a year
- Two-way independent ANOVA, not t-tests since more than one IV
In these outputs is there a effect of gender, education or interaction level TWO WAY ANOVA INDEPENDENT
- Is there an effect of gender overall?
No, F(1,54) = 1.63, p = .207
Is there an effect of education level?
Yes, F(2,54) = 147.52, p < .001
Is there an interaction effect?
Yes, F(2,54) = 4.64, p = .014
How to interpret these findings?
- Main effect of Aspirin: Aspirin reduces heart attackes compard to placebo (1)
- Main effect of carotene: Beta carotene reduces heart attack (2)
- Interaction effect: Yes, bigger effect when aspirin and beta carotene taken together (3) - also lines drawn more its an interaction
WHICH STATEMENT BEST DESCRIBES A COVARIATE?
A variable that is not able to be measured directly.
A variable that shares some of the variance of another variable in which the researcher is interested.
A pair of variables that share exactly the same amount of variance of another variable in which the researcher is interested.
A variable that correlates highly with the dependent variable.
A variable that shares some of the variance of another variable in which the researcher is interested.
TWO-WAY ANOVA IS BASICALLY THE SAME AS ONE-WAY ANOVA, EXCEPT THAT:
The model sum of squares is partitioned into two parts
The residual sum of squares represents individual differences in performance
The model sum of squares is partitioned into three parts
We calculate the model sum of squares by looking at the difference between each group mean and the overall mean
C. The model sum of squares is partitioned into three parts
The model sum of squares is partitioned into the effect of each of the independent variables and the effect of how these variables interact (see Section 13.2.7)
D is also true, but we also do this for both one-way and two-way ANOVA (see Section 13.2.7).
IF WE WERE TO RUN A FOUR-WAY BETWEEN-GROUPS ANOVA, HOW MANY SOURCES OF VARIANCE WOULD THERE BE?
4
16
12
15
16 because 4*4 = 16 (if it was 3x2 then would be 6)
Which of the following sentences best describes a covariate?
A. A variable that shares some of the variance of another variable in which the researcher is interested.
B. A variable that correlates highly with the dependent variable
C. A variable that is not able to be measured directly
D. A pair of variables that share exactly the same amount of variance of another variable in which the researcher is interested
A
An experiment was done to look at whether there is an effect of both gender and the number of hours spent practising a musical instrument on the level of musical ability.
A sample of 30 participants (15 men and 15 women) who had never learnt to play a musical instrument before were recruited. Participants were randomly allocated to one of three groups that varied in the number of hours they would spend practising every day for 1 year (0 hours, 1 hours, 2 hours). Men and women were divided equally across groups.
All participants had a one-hour lesson each week over the course of the year, after which their level of musical skill was measured on a 10-point scale ranging from 0 (you can’t play for toffee) to 10 (‘Are you Mozart reincarnated?’).
A. Two-way independent ANOVA
B. Two-way repeated ANOVA
C. Three way ANOVA = only 2 IVs so no
D. T-test
A
Which of the designs below would be best suited to ANCOVA?
A. Participants were randomly allocated to one of twostress management therapy groups, or a waiting listcontrol group. Their baseline levels of stress weremeasured before treatment, and again after 3months of weekly therapy sessions
B. Participants were randomly allocated to one of twostress management therapy groups, or a waiting listcontrol group. Their levels of stress were measuredand compared after 3 months of weekly therapysessions.
C. Participants were randomly allocated to one of twostress management therapy groups, or a waiting listcontrol group. The researcher was interested in therelationship between the therapist’s ratings ofimprovement and stress levels over a 3-monthtreatment period.
D. Participants were allocated to one of two stressmanagement therapy groups, or a waiting listcontrol group based on their baseline levels ofstress. The researcher was interested ininvestigating whether stress after the therapy wassuccessful partialling out their baseline anxiety
A - baseline levels of stress used as covariate
. We can use the baseline, pre-treatment measures as a control when looking at the impact the treatment has on the 3-month assessment.
A music teacher had noticed that some students went to pieces during exams. He wanted to testwhether this performance anxiety was different for people playing different instruments. He tookgroups of guitarists, drummers and pianists (variable = ‘Instru’) and measured their anxiety(variable = ‘Anxiety’) during the exam. He also noted the type of exam they were performing (inthe UK, musical instrument exams are known as ‘grades’ and range from 1 to 8). He wanted tosee whether the type of instrument played affected performance anxiety when accounting for thegrade of the exam. Which of the following statements best reflects what the effect of ‘Instru’ in theoutput table below tells us?
(Hint: ANCOVA looks at the relationship between an independent and dependent variable, takinginto account the effect of a covariate.
A. The type of instrument played in the exam had asignificant effect on the level of anxietyexperienced, even after the effect of the grade ofthe exam had been accounted for
B. The type of instrument played in the exam had asignificant effect on the level of anxietyexperienced
C. The type of instrument played in the exam did nothave a significant effect on the level of anxietyexperienced
A
Question 5A psychologist was interested in the effects of different fear information on children’s beliefs aboutan animal. Three groups of children were shown a picture of an animal that they had never seenbefore (a quoll). Then one group was told a negative story (in which the quoll is described as avicious, disease-ridden bundle of nastiness that eats children’s brains), one group a positive story(in which the quoll is described as a harmless, docile creature who likes nothing more than to bestroked), and a final group weren’t told a story at all. After the story children rated how scared theywould be if they met a quoll, on a scale ranging from 1 (not at all scared) to 5 (very scaredindeed). To account for the natural anxiousness of each child, a questionnaire measure of traitanxiety was given to the children and used in the analysis. The SPSS output is below.
Whatanalysis has been used?
(Hint: The analysis is looking at the effects of fear information on children’s beliefs about an animal, taking into account children’s natural fear levels.)
A. ANCOVA
B. Independent analysis of variance
C. Repeated measures analysis of variance
A
Imagine we wanted to investigate the effects of three different conflict styles (avoiding, compromising and competing) on relationship satisfaction, but we discover that relationship satisfaction is known to covary with self-esteem. Which of the following questions would be appropriate for this analysis?
A. What would the mean relationship satisfaction be for the three conflict style groups, if their levels of self-esteem were held constant?
B. What would the mean relationship satisfaction be if levels of self-esteem were held constant?
C. What would the mean self-esteem score be for the three groups if their levels of relationship satisfaction were held constant?
D. Does relationship satisfaction have a significant effect on the relationship between conflict style and self-esteem?
A
A study was conducted to look at whether caffeine improves productivity at work in different conditions. There were two independent variables. The first independent variable was email, which had two levels: ‘email access’ and ‘no email access’. The second independent variable was caffeine, which also had two levels: ‘caffeinated drink’ and ‘decaffeinated drink’. Different participants took part in each condition. Productivity was recorded at the end of the day on a scale of 0 (I may as well have stayed in bed) to 20 (wow! I got enough work done today to last all year). Looking at the group means in the table below, which of the following statements best describes the data?
A. A significant interaction effect is likely to be present between caffeine consumption and email access.
B. There is likely to be a significant main effect of caffeine.
C. The effect of email is relatively unaffected by whether the drink was caffeinated.
D. The effect of caffeine is about the same regardless of whether the person had email access.
A = for decaffeinated drinks there is little difference between email and no email, but for caffeinated drinks there is
What are the two main reasons for including covariates in ANOVA?
A. 1. To reduce within-group error variance
2. Elimination of confounds
B. 1. To increase within-group error variance
2. To reduce between-group error variance
C. 1. To increase within-group error variance
2. To correct the means for the covariate
D. 1. To increase between-group variance
2. To reduce within-group error variance
A
A psychologist was interested in the effects of different fear information on children’s beliefs about an animal. Three groups of children were shown a picture of an animal that they had never seen before (a quoll). Then one group was told a negative story (in which the quoll is described as a vicious, disease-ridden bundle of nastiness that eats children’s brains), one group a positive story (in which the quoll is described as a harmless, docile creature who likes nothing more than to be stroked), and a final group weren’t told a story at all. After the story children rated how scared they would be if they met a quoll, on a scale ranging from 1 (not at all scared) to 5 (very scared indeed). To account for the natural anxiousness of each child, a questionnaire measure of trait anxiety was given to the children and used in the analysis. Which of the following statements best reflects what the ‘pairwise comparisons’ tell us?
A. Fear beliefs were significantly higher after negative information compared to positive information and no information, and fear beliefs were not significantly different after positive information compared to no information.
B. Fear beliefs were significantly lower after positive information compared to negative information and no information; fear beliefs were not significantly different after negative information compared to no information.
C. Fear beliefs were significantly higher after negative information compared to positive information; fear beliefs were significantly lower after positive information compared to no information.
D. Fear beliefs were all about the same after different types of information.
A
a musical instrument and gender on the level of musical ability. A sample of 30 (15 men and 15 women) participants who had never learnt to play a musical instrument before were recruited. Participants were randomly allocated to one of three groups that varied in the number of hours they would spend practising every day for 1 year (0 hours, 1 hours, 2 hours). Men and women were divided equally across groups. All participants had a one-hour lesson each week over the course of the year, after which their level of musical skill was measured on a 10-point scale ranging from 0 (you can’t play for toffee) to 10 (‘Are you Mozart reincarnated?’). An ANOVA was conducted on the data from the experiment. Which of the following sentences best describes the pattern of results shown in the graph?
A. The graph shows that the relationship between musical skill and time spent practising was different for men and women.
B. The graph shows that the relationship between musical skill and time spent practising was the same for men and women.
C. The graph indicates that men and women were most musically skilled when they practised for 2 hours per day.
D. Women were more musically skilled than men.
A
What is the decision tree for choosing one-way repeated measures ANOVA? - (5)
Q: What sort of measurement? A: Continuous
Q:How many predictor variables? ONE IV
Q: What type of predictor variable? A: Categorical
Q: How many levels of the categorical predictor? More than two
Q: Same or Different participants for each predictor level? A: Same
The assumption of sphericity in within-subject design ANOVA can be likened to
the assumption of homogeneity of variance in
between-group ANOVA
Sphericity is sometimes denoted as IN REPEATED ANOVA
ε or circularity
What does spherecity refer to repeated anova?
equality of variances of the differences between treatment levels.
you need at least … conditions for spherecity to be an issue in repeated ANOVA
three
How is sphereicty assumed in this datasetA (USED IN REPEATED ANOVA
How is spherecity calculated? - (2) REPEATED ANOVA
- Calculating differences between between pairs of scores for all treatment levels e.g., A-B, A-C , B-C
- Calculating variances of these differences e.g., variances of A-B, A-C, B-C
What does the data from table show in terms of assumption of spherecity (calculated by hand) REPEATED ANOVA? - (3)
there is some deviation from sphericity because the variance of the differences between conditions A and B (15.7) is greater than the variance of the differences
between A and C (10.3) and between B and C (10.3).
However, these data have local circularity (or local sphericity) because two of the variances of differences are identical.
The deviation from spherecity in the data does not seem too severe (all variances roughly equal) but here assess deviation is serve to warrant an action
How to assess the assumption of sphereicity in SPSS REPEATED ANOVA?
via Mauchly’s test
If Mauchly’s test statistic is significant (p < 0.05) then REPEATED ANOVA
variance of differences between conditions are significnatly different - must be vary of F-ratios produced by computer
If Mauchly’s test statistisc is non significant (p > 0.05) then it is reasonable to conclude that the REPEATED ANOVA
varainces of the differences between conditions are equal and does not significantly differ
Signifiance of Mauchly’s test REPEATED ANOVA is dependent on
sample size
Example of signifiance of Maulchy’s test dependent on sample size REPEATED ANOVA - (2)
in big samples small deviations from sphericity can be
significant,
small samples large violations can be non-significant
What happens if the data violates the sphereicity assumption REPEAED ANOVA? - (2)
several corrections that can be applied to
produce a valid F-ratio
or
use multivariate test statistics (MANOVA)
What corrections to apply to produce valid F-ratio when data violates sphereicity REPEATED ANOVA? - (2)
- Greenhouse-Geisser correction ε
- Huynh-Feldt correction
The Greenhouse-Geisser correction varies between REPEATED ANOVA
1/k-1 (k is number of repeated measures conditions) and 1
The closer that Greenhouse Geisser correction is to 1, the REPEATED ANOVA
more homogeneous the variances of differences, and hence the closer the data are to being spherical.
How to calculate lower-bound estimate fo spherecity for Greenhouse-Geisser correction when there is 5 conditions REPEATED ANOVA? - (2)
Limit of f ε^ is 1/k (number of repeated-measures conditions)
so… 1/(5-1) = 1/4 = 0.25
Huynh and Feldt (1976) reported that when the
Greenhouse-Geisser correction is too conservative
Huynh-Feldt correction is less conservative than
Greenhouse-Geisser correction
Why is MANOVA used when data that violates spherecity IN REEATED ANOVA?
MANOVA is not dependent upon the assumption of sphericity
In repeated measures ANOVA, the effect of our experiment is shown up in within participant variance than
between group variance
In independent ANOVA, the within-group variance is our…. and it is not contaimed by… - (2)
residual variance (SSR) = variance produced by individual differences in performance
SSR is not contaimined by experimental effect as study carried out by different people
In repeated-measures ANOVA, the within-participant variability is made up of
the effect of experimental manipulation SSM and individual differences in performance (random factors outside of our control) - this is error SSR
Similar to independent ANOVA, repeated-measures ANOVA uses F-ratio to - (2)
compares the size of the variation due to our experimental
manipulations to the size of the variation due to random factors
has same type of variances in independent - total sum of squares (SST), model sum of squares (SSM) and a residual sum of squares (SSR)
What is the differences between independent ANOVA and repeated-measures design ANOVA?
repeated-measures ANOVA the model and residual sums of squares are both part of the within-participant variance.
In repeated-measures ANOVA
If the variance due to our manipulations is big relative to
the variation due to random factors, we get a .. and conclude - (2)
big value of F ratio
we can conclude that the observed results are unlikely to have occurred if there was no effect in the population.
To compute F-ratios we first compute the sum of squares which is the following REPEATED ANOVA… - (5)
- SST
- SSB
- SSW
- SSM
- SSR
How is SST calculated in one-way repeated measures ANOVA? REPEATED ANOVA
SST = grand variance (N-1)
What is the DF’s of SST? REPEATD ANOVA
N-1
The SSW (within-participant) sum of squares is calculated in one-way repeated ANOVA by…
square of the standard deviation of each participant’s scores multiplied by the number of conditions minus 1, summed over all participants.
What is the DF of SSW of one-way repeated ANOVA? - (2)
DF = N(n-1)
number of participants multiplied by the number of conditions minus 1;
How is SSM calculated in one-way repeated ANOVA? - (2)
square of the differences between the mean of the participant scores for each condition and the grand mean multiplied by the number of participants tested, summed over all conditions.
do this for each condition grp
What is the DF of SSM in one-way repeated ANOVA? - (2)
DF = n-1
n is number of conditions
How is SSR calculated in one-way repeated ANOVA?
the difference between the within-participant sum of squares and the sum of squares for the model.
What is the DF for SSR in one-way repeated ANOVA?
DF of SSW minus DF of SSM
How do we calculated mean squares (MS) and mean residuals (MR) to acalculate F-ratio in one-way repeated ANOVA?
We don’t need to use SSB (between-subject variation )to calculate F-ratio in
one-way repeated ANOVA
What does SSB represent in one-way ANOVA?
individual differences between cases
Not only does sphereicity produces problems for F in repeated measures ANOVA but causes complications for
post-hoc tests
When spereicity is violated in one-way repeated ANOVA , what post-hoc test to use and why - (2)
Bonferroni method seems to be generally the
most robust of the univariate techniques,
especially in terms of power and control of the Type I error rate.
When sphereicity is not violated in one-way repeated ANOVA, then what post-hoc tests to use?
Tukey can be used
In either case where sphereicity is violated or not in one-way repeated ANOVA, a post-hoc test called - (2)
Games–Howell procedure, which uses a pooled error term,
it is more preferable to Tukey’s test.
Due to complications of sphereicity in one-way repeated ANOVA,
standard post hoc tests used for independent designs not avaliable for repeated measure designs
Why is repeated contrast useful in repeated-measures design especially one-way repeated measures?
levels of the independent variable have a meaningful order e.g., meausred DV at successive time points or adminstered increasing doses of a drug
When should Sidack correction as post hoc be selected for one-way repeated ANOVA?
concerned about the loss
of power associated with Bonferroni corrected values.
Researcher measures mean time taken for celebrities to retch for each animal (sticky insect, kangaroo testicle, fish eye, witchey grub) - one-way repeated ANOVA
what does these SPSS outputs show? - (2)
- Left shows variables represent each level of IV which is animal
- Right shows descriptive statistics - higher mean time to retch when celebrity eating stick insect (8.12)
Researcher measures mean time taken for celebrities to retch for each animal (sticky insect, kangaroo testicle, fish eye, witchey grub) - ONE WAY repeated anova
What does this Mauchly’s Test of Spherecity show? - (2)
- P-value is 0.047 which is less than 0.05
- Thus, reject the assumption of spherecity that variances of the differences between levels are equal
Researcher measures mean time taken for celebrities to retch for each animal (sticky insect, kangaroo testicle, fish eye, witchey grub)
What to do if this Mauchly’s Test of Spherecity shows assumption of sphereicity is violated..? - (3)
one way repeated ANOVA
- Since there are 4 conditions, lower limit of ε^ is 1/(4-1) = 0.333 (lower-bound estimate in table)
- SPSS Output 13.2 shows that the calculated value of ε
^ is 0.533. - 0.533 is closer to the lower limit of 0.33 than it is to the upper limit of 1 and it therefore represents a substantial deviation from sphericity
Researcher measures mean time taken for celebrities to retch for each animal (sticky insect, kangaroo testicle, fish eye, witchey grub) - one-way repeated ANOVA
What does this main ANOVA table show in terms of spherecity assumed? - (2)
- The value of F = 3.97 which is compared against a critical value for 3 and 21 DF and p-value is 0.026
- conclude there is significant difference between 4 animals in their capacity to induce retching when eaten
Researcher measures mean time taken for celebrities to retch for each animal (sticky insect, kangaroo testicle, fish eye, witchey grub) - one-way repeated ANOVA
What has changed and kept the same in the table? - (2)
- The F-ratios are the same across the rows
- the D.F is changed as well as critical value the F-statistic is compared with
Researcher measures mean time taken for celebrities to retch for each animal (sticky insect, kangaroo testicle, fish eye, witchey grub) - one-way repeated ANOVA
How is adjustments made to DF?
- Adjustment made by multiplying the DF by the estimate of spherecity.
Researcher measures mean time taken for celebrities to retch for each animal (sticky insect, kangaroo testicle, fish eye, witchey grub) - one-way repeated ANOVA
What does the results show in terms of Greenhouse-Geisser correction and Huynh-Fedt..? - (3)
- Observed F statistic not significant using Greenhouse-Geisser ( p> 0.05)
- Greenhouse-Geisser is quite conservative and miss true effects that exist
- Thus, Huynh-Feldt showend F-statistic is still significant as p-value of 0.048
Researcher measures mean time taken for celebrities to retch for each animal (sticky insect, kangaroo testicle, fish eye, witchey grub) - one-way repeated ANOVA
What happens if Greenhouse Geisser is not-significant (p>0.05) and Huynh-Feldt is significant in this example? - (2)
- Taking average of two significant values e.g., 0.063+ 0.048/2 = 0.056
- Thus, go with Greenhouse-Geisser correction and conclude F ratio is non-significant
Researcher measures mean time taken for celebrities to retch for each animal (sticky insect, kangaroo testicle, fish eye, witchey grub) - one-way repeated ANOVA
What happens if two corrections - Greenhouse and Felt give same conclusion then you can choose which one to
report
Researcher measures mean time taken for celebrities to retch for each animal (sticky insect, kangaroo testicle, fish eye, witchey grub) - one-way repeated ANOVA
Important to use valid critical value of F - choosing which p-value to report as it potentially makes a difference between making a
Type 1 error (False positive) or not
Researcher measures mean time taken for celebrities to retch for each animal (sticky insect, kangaroo testicle, fish eye, witchey grub) - one-way repeated ANOVA
what does this summary table of repeated contrasts show? - (3)
Level 1 vs 2 is stick insect vs kangaroo testicle
Level 2 vs 3 is kangaroo testicle vs fish eyeball
Level 3 vs 4 is fish eyeball vs witchetty grub
- celebrities took significantly longer to retch after
eating the stick insect compared to the kangaroo testicle (Level 1 vs. Level 2) - p-value of 0.002 - Time taken to retch was not significantly different in Level 2 vs 3 and Level 3 vs 4
Researcher measures mean time taken for celebrities to retch for each animal (sticky insect, kangaroo testicle, fish eye, witchey grub) - one-way repeated ANOVA
If main effect is not significant in main ANOVA table for this data then significant contrasts in table below should be … but if MANOVA was significant then… - (2)
ignored
inclined to conclude main effects of animal was significant and proceed with further tetss like contrasts
What IV, DV , design and test to use for this research scenario? - (4)
- Repeated measures design
- One IV (Incentive) , four conditions (week 1, week 2, week 3, week 4)
- One DV (Sales Generated)
- One-way repeated ANOVA
What does LSD correction (post-hoc option in SPSS)
does not actually make any adjustments to p-value in terms of critical value as what post-hoc test should do
What does output show? - (3)
- Repeated measures design
- One IV (Incentive) , four conditions (week 1, week 2, week 3, week 4)
- One DV (Sales Generated)
- One-way repeated ANOVA
- sales are increasing across the weeks
- Week 1 start at 427.93 and gradually rise by week 4 to 642,28 pounds
- looks like incentives are having an effect and seem to generate higher sales
What does this output show in terms of Maulchys Test of Spherecity? - (2)
- Repeated measures design
- One IV (Incentive) , four conditions (week 1, week 2, week 3, week 4)
- One DV (Sales Generated)
- One-way repeated ANOVA
- P-value is not significant ( p = 0.080)
- Assumption of spherecity is satisfied so we got equal variances between differences across conditions
If Maulchy’s test of spherecity is not significant in one-way repeated ANOVA, then which line do we use in main ANOVA table?
If Maulchy’s test of spherecity is significant in one-way repeated ANOVA, then which line do we use in main ANOVA table?
What does this main ANOVA table show? - (3)
- Repeated measures design
- One IV (Incentive) , four conditions (week 1, week 2, week 3, week 4)
- One DV (Sales Generated)
- One-way repeated ANOVA
- DF for week is 3 and 57 (spherecity assumed from week and error)
- Week: F(3,57) = 26.30, p < 0.001 (p = 0.000), eta-squared is 0.58 - large effect
- There is an overall effect going on and change across weeks
What does this Sidmak correction table and table of means show you in this output? - (6)
- Repeated measures design
- One IV (Incentive) , four conditions (week 1, week 2, week 3, week 4)
- One DV (Sales Generated)
- One-way repeated ANOVA
- No sig difference betwen W1 and W2
- Sig difference between W1 and W3 = ihigher sales in W3 (538.570) compared to W1 (427.933)
- Sig difference between W1 and W4 = ihigher sales in W3 (642.284) compared to W1 (427.933)
*Not sig diff with W2 and W3 - Sig difference between W2 and W4 , higher sales in W4 (642.284) than W2 (481.388)
- Sig difference between W3 and W4 , higher sales in W4 (642.284) than W3 (538.570)
What does this output show in terms of repeated contrasts? - (3)
- Repeated measures design
- One IV (Incentive) , four conditions (week 1, week 2, week 3, week 4)
- One DV (Sales Generated)
- One-way repeated ANOVA
- Did sales increase from W1 to W2? = p = 0.010 significant
- Did sales increase from W2 to W3? = p = 0.030
- Did sales increase from W3 to W4? = p = 0.008
What happens if post hoc and contrasts are telling a different story? - contrasts says weekly increase e.g. W1 to W2 increase, W2 to W3 increase , W3 to W4 increase but post-hoc W1 to W3 was increased sig, W1 to W4 was sig increase but W2 to W3 was not - (2)
- Repeated measures design
- One IV (Incentive) , four conditions (week 1, week 2, week 3, week 4)
- One DV (Sales Generated)
- One-way repeated ANOVA
- Post hoc has lack of power due to many multiple comparisons
- By limiting comparisons in contradt we get around problem
Diagram of writing up one-way repeated ANOVA
Two-way repeated ANOVA involves
more than one IV
What does four-way ANOVA mean?
4 different IV
What does 2x3 ANOVA means? - (2)
- IV with 3 levels
- IV with 2 levels
What design, IV, DV and test would you to to investigate the follow scenario? - (4)
- Repeated measures design
- Two IVs: alcohol (3 conditions) and sleep (2 conditions)
- DV: Reaction Times
- Two-way repeated measures ANOVA
What does this two-way repeated ANOVA SPSS output show? - (2)
- Repeated measures design
- Two IVs: alcohol (3 conditions) and sleep (2 conditions)
- DV: Reaction Times
- Two-way repeated measures ANOVA
- large number for RT means slower RT
- Alcohol seem to have an effect on RT but particularly for 2 pints + no sleep
What does this two-way repeated ANOVA SPSS output show for Mauchly’s Test of Sphericity? - (2)
- Repeated measures design
- Two IVs: alcohol (3 conditions) and sleep (2 conditions)
- DV: Reaction Times
- Two-way repeated measures ANOVA
- Two p-values: alcohol ( p = 0.00) and alcohol * sleep [ interaction effect] (p = 0.00) – > sig so assumption of spherecity is violated so report Grenhouse-Geisser values from main ANOVA table
- No p-value for sleep as only 2 conditions and test of sphericity need more than 2
What does this two-way repeated ANOVA main table show? - (3)
- Repeated measures design
- Two IVs: alcohol (3 conditions) and sleep (2 conditions)
- DV: Reaction Times
- Two-way repeated measures ANOVA
- Error DF was 38.
- Test of Spherecity was sig –> assumption violated
- Main sig effect of alcohol: F(1.16,22.06) = 51.38, p < 0.001, partial eta-squared = 0.73
- Main sig effect of sleep: F(1,19) = 88.61, p < 0.001, partial-eta-squared = 0.82
- Interaction effect: F(1.15,21.91) = 23.36, p < 0.001, partial-eta squared = 0.55
What does this two-way repeated ANOVA output show in post hocs? - Sidmak correction - (4)
- Repeated measures design
- Two IVs: alcohol (3 conditions) and sleep (2 conditions)
- DV: Reaction Times
- Two-way repeated measures ANOVA
- condition 1 and condition 2 which was significant
- Condition 1 vs 3 which was significant
- Condition 2 with Condition 3 was significant
- So all groups differing significantly from each other so interpret from that higher does of alcohol has more impact on RT
What does this two-way repeated ANOVA interaction plot show? - (3)
- Repeated measures design
- Two IVs: alcohol (3 conditions) and sleep (2 conditions)
- DV: Reaction Times
- Two-way repeated measures ANOVA
- Interaction effect is there = as line continue they cross
- Most pronouned effect was in alcohol grp 3 (2 pints)
- When alcohol grp 3 had full nights sleep (2), impairs their RT very slightly
- When alcohol grp 3 had sleep deprivation (1) in combination with 2 pints, it impairs RT by a lot –> use simple effect analysis as well as two-way independent ANVOA to see if difference in grp 3 of blue and green line is sig
What happens when assumptions vilated in repeated-measures ANOVA? - (2)
Can do non-parametric test called Friedman’s ANOVA if only one IV
There is no non-parametric counterpart for more than one IV in repeated design
Assumption of repeated measures ANOVA - (3)
- Normal distribution
- Repeated measures design (same participants)
- Sphereicity - Mauchly’s test
What does significant Mauchly’s test signify in repeated measures? - (2)
A significant effect means that corrections need to be made later on
Those corrections are listed in the main ANOVA output table
What is decision tree for two-way repeated ANOVA
1 DV continous and 2 or more categorical predictors with 2 or more levels with same participants in each predictor level
What is decision tree for one-way repeated ANOVA? - (3)
1 DV continous
1 Predictor categorical with more than 2 levels
Same participants in each predictor level
Just like independent measure designs there can be more than one categorical predictor.
When all participants take part in all combinations of those predictors, we have a repeated measures factorial design and can use an ANOVA to test for
significant main effects and interactions
Example of two-way repeated ANOVA - (3)
The variables are the type of drink (Beer - Wine - Water) and the type of imagery used in the advertisement (positive - negative - neutral)
The outcome is how much the participant likes the beverage on a scale from -100 (dislike very much) to 100 (like very much)
Participants went two conditions
Equation of variance
What is mixed design? - (2)
A mixture of between-subject and within-subject
Several independent variables or predictors have been measured; some have been measured with different entities, (pps) whereas others used the same entities (pps)
You will need at least two IVs for
mixed design
What is decision tree for mixed design ANOVA? - (7)
Q: What sort of measurement? A: Continuous
Q:How many predictor variables? A: Two or more
Q: What type of predictor variable? A: Categorical
Q: How many levels of the categorical predictor? A: Not relevant
Q: Same or Different participants for each predictor level? A: Both
This leads us to and Factorial mixed ANOVA
Example of mixed design scenario for ANOVA - (2)
a mixed ANOVA is often used in studies where you have measured a dependent variable (e.g., “back pain” or “salary”) over two or more time points or when all subjects have undergone two or more conditions (i.e., where “time” or “conditions” are your “within-subjects” factor),
but also measure DV when your subjects have been assigned into two or more separate groups (e.g., based on some characteristic, such as subjects’ “gender” or “educational level”, or when they have undergone different interventions). These groups form your “between-subjects” factor.
An organizational psychologist is hired as a consultant by a person planning to open a coffee house for college students. The coffee house owner wants to know if her customers will drink more coffee depending on the ambience of the coffee house. To test this, the psychologist sets up three similar rooms, each with its own theme (Tropical; Old Library; or New York Café ) then arranges to have thirty students spend an afternoon in each room while being allowed to drink all the coffee they like. (The order in which they sit in the rooms is counterbalanced.) The amount each participant drinks is recorded for each of the three themes.
- Independent variable(s)
- Is there more than 1 IV?
- The levels the independent variable(s)
- Dependent variable
- Between (BS) or within-subjects (WS)?
- What type of design is being used?
Theme
No
Tropical, Old Library,
New York Café
Amount of coffee consumed
Within-subjects
1-way Repeated measures
A manager at a retail store in the mall wants to increase profit. The manager wants to see if the store’s layout (one main circular path vs. a grid system of paths) influences how much money is spent depending on whether there is a sale. The belief is that when there is a sale customers like a grid layout, while customers prefer a circular layout when there is no sale. Over two days the manager alternates the store layout, and has the same group of customers come each day. Based on random assignment, half of the customers told there is a sale (20 % will be taken off the final purchases), while the other half is told there is no sale. At the end of each day, the manager calculates the profit.
- Independent variable(s)
- Is there more than 1 IV?
- The levels the independent variable(s)
- Dependent variable
- Between (BS) or within-subjects (WS)?
- What type of design is being used?
Sale/ No Sale, Store’s layout
Yes
Sale-No Sale, Grid-Circular
Profit
BS (Sale) and WS (Layout)
2-way mixed Measures
A researcher at a drug treatment center wanted to determine the best combination of treatments that would lead to more substance free days. This researcher believed there were two key factors in helping drug addiction: type of treatment and type of counseling. The researcher was interested in either residential or outpatient treatment programs; and either cognitive-behavioral, psychodynamic, or client-centered counseling approaches. As new clients enrolled at the center they were randomly assigned to one of six experimental groups. After 3 months of treatment, each client’s symptoms were measured.
- Independent variable(s)
- Is there more than 1 IV?
- The levels the independent variable(s)
- Dependent variable
- Between (BS) or within-subjects (WS)?
- What type of design is being used?
Type of treatment, Type of counseling.
Yes
Residential or outpatient/ cognitive-behavioural, psychodynamic or client-centered.
Substance-free days
Between subjects
2-way independent measures ANOVA.
Assumptionsof mixed ANOVA - (3)
Normal Distribution
Independent and Repeated Factors
Homogeneity of Variance for the Independent factor
+
Sphericity for the Repeated factor
Assumptions of repeated-measures ANOVA -(3)
Normal Distribution
Repeated Measure Design (same participants)
Sphericity (Mauchly’s Test)
Assumptions of independent ANOVA - (3)
Normal Distribution
Independence of Scores
Homogeneity of Variance (Levene’s Test)
Leven’s test tests if the variances in independent groups are similar, would levene’s test be significant in this case?
Levene’s test would likely be significant as the variance between the two groups are quite different.
Sphereicity is an assumption of both
repeated and mixed models
If p-value significant for checking for spherecity ten - (3)
If GG < 0.75 THEN USE GG
IF GG > 0.75 THEN USE HF
Since GG less than 0.75 report adjusted F, DF and sig which is F(1.24, 21.00) = 212.32 , p < 0.001
Homogenity of variance is
distribution of groups are similar?
Spherecity is asking are the disttibution of differences between groups are
similar
The researcher hypothesized that there would be an interaction between dog breed (Collie or German Shepherd) and week of obedience school training (all dogs measured at 1 week and 5 weeks) as they relate to the number of times the dog growls per week. Specifically, it was hypothesized that Collies would show no difference in growls between 1 week and 5 weeks, but German Shepherds would growl less at 5 weeks than at 1 week.
- Independent variable(s)
- Is there more than 1 IV?
- The levels the independent variable(s)
- Dependent variable
- Between(BS) or within-subjects (WS)?
- What type of design is being used?
- Dog breed and measurement time
- Yes
- Collie-German Shepard/Week 1-Week 5
- Number of growls
- Dog breed Between and measurement time is within
- 2-WAY mixed ANOVA
What does this 2-way mixed ANOVA show? - (3)
- Independent variable(s)
- Is there more than 1 IV?
- The levels the independent variable(s)
- Dependent variable
- Between(BS) or within-subjects (WS)?
- What type of design is being used?
1) is there an effect overall = Yes (green)
2) Is the effect bread = Yes (red)
3) Is there an interaction = Yes (blue)
Partioning of variance of one way vs two way independent
Rules of contrast coding - (5)
Rule 1: Groups coded with positive weights compared to groups coded with negative weights.
Rule 2: The sum of weights for a comparison should be zero.
Rule 3: For a given contrast, the weights assigned to the group(s) in one chunk of variation should be equal to the number of groups in the opposite chunk of variation.
Rule 4: If a group is not involved in a comparison, assign it a weight of zero
Rule 5: If a group is singled out in a comparison, then that group should not be used in any subsequent contrasts.
Contrast coding example SPSS how to read
When conducting a Repeated-Measures ANOVA, which of the following assumptions is NOT relevant?
A.Independent residuals
B.Homogeneity of variance
C.Sphericity
D.They are all relevant
B
One advantage of repeated measures designs over independent designs is that we are able to calculate a degree of error for each effect, whereas in an independent design we are able to calculate only one degree of error: true or false?
True or False
True
An experiment was conducted to see how people with eating disorders differ in their need to exert control in different domains. Participants were classified as not having an eating disorder (control), as having anorexia nervosa (anorexic), or as having bulimia nervosa (bulimic). Each participant underwent an experiment that indicated how much they felt the need to exert control in three domains: eating, friendships and the physical world (this final category was a control domain in which the need to have control over things like gravity or the weather was assessed). So all participants gave three responses in the form of a mean reaction time; a low reaction time meant that the person did feel the need to exert control in that domain. The variables have been labelled as group (control, anorexic, or bulimic) and domain (food, friends, or physical laws). Of the following options, which analysis should be conducted?
A. Analysis of covariance
B. Two-way repeated measures ANOVA
C. Two-way mixed ANOVA
D. Three-way independent ANOVA
C
Two IVs = Group (Control, Anroexic, Bullimic) and Domain (Food, Friends, Physical Laws)
Group is between
Each partiicpant underwent domains so within
DV = Participannts measured
An experiment was done to compare the effect of having a conversation via a hands-free mobile phone, having a conversation with an in-car passenger, and no distraction (baseline) on driving accuracy. Twenty participants from two different age groups (18–25 years and 26–40 years) took part. All participants in both age groups took part in all three conditions of the experiment (in counterbalanced order), and their driving accuracy was measured by a layperson who remained unaware of the experimental hypothesis.
How do we interpret the main effect of distraction from the SPSS table (next slide)? - (2)
The assumption of sphericity has been met, indicated by Mauchly’s test (p > .05).
There was a significant main effect of distraction (F(2, 36) = 45.95, p < .001). This effect tells us that if we ignore the effect of age, driving accuracy was significantly different in at least two of the distraction groups.
Two-way repeated-measures ANOVA compares:
A. Several means when there are two independent variables, and the same entities have been used in all conditions
B. Two means when there are more than two independent variables, and the same entities have been used in all conditions.
C. Several means when there are two independent variables, and the same entities have been used in some of the conditions.
D. Several means when there are more than two independent variables, and some have been manipulated using the same entities and others have used different entities.
A
When conducting a repeated-measures ANOVA which of the following assumptions is not relevant?
A. Homogeneity of variance
B. Sphericity
C. Independent residuals
D. They are all relevant
A
The table shows hypothetical data from 3 conditions
For these data, spherecity will hold when
(Hint: Sphericity refers to the equality of variances of the differences between treatment levels.)
A.The variances of the differences between treatment levels are roughly equal
B. The variance of each condition is roughly equal
C. The variance of each condition is not equal
D. The variances of the differences between treatment levels are not equal
A
Imagine we were interested in the effect of supporters singing on the number of goals scored by soccer teams. We took 10 groups of supporters of 10 different soccer teams and asked them to attend three home games, one at which they were instructed to sing in support of their team (e.g., ‘Come on, you Reds!’), one at which they were instructed to sing negative songs towards the opposition (e.g., ‘You’re getting sacked in the morning!’) and one at which they were instructed to sit quietly. The order of chanting was counterbalanced across groups. Looking at the output below, which of the following sentences is correct?#
A.The results showed that the number of goals scored was significantly affected by the type of singing from the supporters, F(2, 18) = 11.24, p = .001.
B. The results showed that the number of goals scored was significantly affected by the type of singing from the supporters, F(1.58, 14.19) = 11.24, p = .002.
C. The results showed that the number of goals scored was significantly affected by the type of singing from the supporters, F(2, 12.4) = 11.24, p = .001.
D. The results showed that the number of goals scored was significantly higher when supporters sang positive songs towards their team than when they sat quietly, F(2, 18) = 11.24, p = .001.
A = Mauchly’s test was non-significant, so we can report the result in the row labelled ‘sphericity assumed’
Imagine we were interested in the effect of supporters singing on the number of goals scored by soccer teams. We took 10 groups of supporters of 10 different soccer teams and asked them to attend three home games, one at which they were instructed to sing in support of their team (e.g., ‘Come on, you Reds!’), one at which they were instructed to sing negative songs towards the opposition (e.g., ‘You’re getting sacked in the morning!’) and one at which they were instructed to sit quietly. The order of chanting was counterbalanced across groups. An ANOVA with a simple contrasts using the last category as a reference was conducted. Looking at the output tables below, which of the following sentences regarding the contrasts is correct?
a.The first contrast revealed that soccer teams scored significantly more goals when their supporters sang positive songs compared to when they did not sing. The second contrast revealed that soccer teams scored significantly fewer goals when their supporters sang negative songs compared to when they did not sing.
b. The first contrast revealed that soccer teams scored significantly fewer goals when their supporters did not sing compared to when they sang negative songs. The second contrast revealed that soccer teams scored a similar amount of goals when their supporters sang positive songs compared to when they did not sing.
c. The first contrast revealed that soccer teams scored significantly more goals when their supporters sang positive songs compared to when they did not sing. The second contrast revealed that soccer teams scored significantly fewer goals when their supporters sang negative songs compared to when they sang positive songs.
d. The first contrast revealed that soccer teams scored significantly more goals when their supporters sang positive songs compared to when they did not sing. The second contrast revealed that soccer teams did not significantly differ in the number of goals scored when their supporters sang negative songs compared to when they did not sing.
a = see from the means in the Descriptive Statistics table that positive singing resulted in the highest number of goals scored and negative singing resulted in the least number of goals score
An experiment was done to compare the effect of having a conversation via a hands-free mobile phone, having a conversation with an in-car passenger, and no distraction (baseline) on driving accuracy. Twenty participants from two different age groups (18–25 years and 26–40 years) took part. All participants in both age groups took part in all three conditions of the experiment (in counterbalanced order), and their driving accuracy was measured by a layperson who remained unaware of the experimental hypothesis.
Which of the following sentences is the correct interpretation of the main effect of distraction?
AThere was a significant main effect of distraction, F(2, 36) = 45.95, p < .001. This effect tells us that if we ignore the effect of age, driving accuracy was significantly different in at least two of the distraction groups.
B. There was no significant main effect of distraction, F(2, 36) = 45.95, p = .719. This effect tells us that if we ignore the effect of age, driving accuracy was the same for no distraction, hands-free conversation and in-car passenger conversation.
C. There was a significant main effect of distraction, F(2, 36) = 45.95, p < .001. This effect tells us that driving accuracy was different for no distraction, hands-free conversation and in-car passenger conversation in the two age groups.
D. There was no significant main effect of distraction, F(2, 36) = 45.95, p > .05. This effect tells us that none of the distraction groups significantly distracted participants across both age groups.
A = We can read the results in the row labelled ‘sphericity assumed’, as we can see from the output of Mauchly’s test that the assumption of sphericity has been met, p > .05. However, we would need to do some follow-up tests to investigate exactly where the differences between groups lie
Field and Lawson (2003) reported the effects of giving children aged 7–9 years positive, negative or no information about novel animals (Australian marsupials). This variable was called ‘Infotype’. The gender of the child was also examined. The outcome was the time taken for the children to put their hand in a box in which they believed either the positive, negative, or no information animal was housed (positive values = longer than average approach times, negative values = shorter than average approach times). Based on the output below, what could you conclude?
A. Approach times were significantly different for the boxes containing the different animals, but the pattern of results was unaffected by gender.
B. Approach times were significantly different for the boxes containing the different animals, and the pattern of results was affected by gender.
C. Approach times were not significantly different for the boxes containing the different animals, but the pattern of results was affected by gender.
D.Approach times were not significantly different for the boxes containing the different animals, but the pattern of results was unaffected by gender.
A
What leads to chi-squared test?
Q: What sort of measurement? A: Categorical (in this case counts or frequencies)
Q:How many predictor variables? A: One
Q: What type of predictor variable? A: Categorical
Q: How many levels of the categorical predictor? A: Not relevant
Q: Same or Different participants for each predictor level? A: Different
This leads us to and Chi-square test for independence of groups
In chi-square test, participants is allocated to one and only one category such as - (3)
pass or fail,
pregnant or not pregnant,
win, draw or lose
Since each participant is allocated to one category in chi-squared test each individual therefore
contributes to the frequency or count with which a category occurs
Table scenario in which cats can be trained to dance more effectively with food or affection at reward - chi-squared test
Table scenario in which cats can be trained to dance more effectively with food or affection at reward - chi-squared test
what are the four categories? - (4)
- could they dance - yes
- could they dance - no
- food as reward
- affection as reward
Table scenario in which cats can be trained to dance more effectively with food or affection at reward - chi-squared test
highlight the frequencies for four categories
Table scenario in which cats can be trained to dance more effectively with food or affection at reward - chi-squared test
what do the rows give?
Row totals give frequencies of dancing and non-dancing cats
Table scenario in which cats can be trained to dance more effectively with food or affection at reward - chi-squared test
what do the columns give? - (2)
The column totals give frequencies of food and affection as reward
These are the numbers in each group
IV and DV in chi-squared tests - (2)
One categorical DV (because of frequencies)
with one categorical IV with different participants at each predictor level
In chi-squared categorical outcomes, the null hypothesis is set
up on the basis of expected frequencies, four all four variable combinations, based on the idea that the variable of interest has no effect on frequencies
What does the chi-square tests?
whether there is a relationship between two categorical variables.
In chi-square since we are using categorical variables we can not use
mean or any similar statistic hence cannot use any parametric tests
What does chi-square compare?
observed frequencies from the data with frequencies which would be expected if there was no relationship between the two variables.
In chi-square test when measuring categorical variables we are interested in
frequencies (number of items that fall into combination of categories)
Example of scenario using chi-square
We have a list of movie genres; this is our first variable. Our second variable is whether or not the patrons of those genres bought snacks at the theatre. Our idea (or, in statistical terms, our null hypothesis) is that the type of movie and whether or not people bought snacks are unrelated. The owner of the movie theatre wants to estimate how many snacks to buy. If movie type and snack purchases are unrelated, estimating will be simpler than if the movie types impact snack sales.
What is assumptions of chi-square test? - (3)
Data values that are a simple random sample from the population of interest.
Two categorical or nominal variables. Don’t use the independence test with continuous variables that define the category combinations. However, the counts for the combinations of the two categorical variables will be continuous.
For each combination of the levels of the two variables, we need at least five expected values. When we have fewer than five for any one combination, the test results are not reliable
We have a list of movie genres; this is our first variable. Our second variable is whether or not the patrons of those genres bought snacks at the theatre. Our idea (or, in statistical terms, our null hypothesis) is that the type of movie and whether or not people bought snacks are unrelated. The owner of the movie theatre wants to estimate how many snacks to buy. If movie type and snack purchases are unrelated, estimating will be simpler than if the movie types impact snack sales.
Is the Chi-square test of independence an appropriate method to evaluate the relationship between movie type and snack purchases? - (3)
We have a simple random sample of 600 people who saw a movie at our theatre. We meet this requirement.
Our variables are the movie type and whether or not snacks were purchased. Both variables are categorical.
But last requirement is for more than five expected values for each combination of the two variables. To confirm this, we need to know the total counts for each type of movie and the total counts for whether snacks were bought or not. = check later
We have a list of movie genres; this is our first variable. Our second variable is whether or not the patrons of those genres bought snacks at the theatre. Our idea (or, in statistical terms, our null hypothesis) is that the type of movie and whether or not people bought snacks are unrelated. The owner of the movie theatre wants to estimate how many snacks to buy. If movie type and snack purchases are unrelated, estimating will be simpler than if the movie types impact snack sales.
Diagram of contigency table in Chi-square and calculating row totals and colum and grand total - (7)
50 + 125 + 90 +45 = 310
75 + 175 + 30 + 10 = 290
50 + 75 = 125
125 + 175 = 300
90 + 30 = 120
45 + 10 = 55
310 + 290 = 600
How to calculate chi-square test statistic? - (4)
- Calculate the difference from actual and expected for each Movie-Snacks combination.
- square that difference.
- Divide by the expected value for the combination.
- We add up these values for each Movie-Snacks combination. This gives us our test statistic.
We have a list of movie genres; this is our first variable. Our second variable is whether or not the patrons of those genres bought snacks at the theatre. Our idea (or, in statistical terms, our null hypothesis) is that the type of movie and whether or not people bought snacks are unrelated. The owner of the movie theatre wants to estimate how many snacks to buy. If movie type and snack purchases are unrelated, estimating will be simpler than if the movie types impact snack sales.
Diagram of contigency table in Chi-square of calculating eexpected counts
e.g., for action and snacks it would be column total (310) * row total (125) divided by grand total of 600 = 65
Example of calculating chi-square from table
For this it would be 65.03
Does the area of psychology that a person prefers depend on whether they would select a cat or a dog as a pet? - chi-square test of independence
Chi-square example we need to check the assumptions below - (2)
Independence
Each item or entity contributes to only one cell of the contingency table.
The expected frequencies should be greater than 5.
In larger contingency tables up to 20% of expected frequencies can be below 5, but there is a loss of statistical power.
Even in larger contingency tables no expected frequencies should be below 1.
How to understand your test statistic from chi-squared? - (5) if you have test statistic of 65.03
- Set your significance level = .05
- Calculate the test statistic -> 65.03
- Find your critical value from chi-squared distribution table based on df & significance level
- Degrees of freedom: df (r – 1) x (C-1)
For the movie example this is; Df = (4-1) x (2-1) = 3 -> 7.815 - compare test statistic with critical level
65.03 > 7.82 so reject the idea that movie type and snack purchases are independent
Example of research question and hypothesis and sig level of chi-square test of independence- (4)
Research question:
Does the area of psychology that a person prefers depend on whether they would select a cat or a dog as a pet?
Hypotheses:
H0: The area of interest in psychology and type of pet preferred are independent of each other.
H1: The area of interest in psychology and type of pet preferred are not independent of each other. That is the primary area of interest in psychology depends on whether you prefer a cat or a dog.
Significance level: α = .05
Does the area of psychology that a person prefers depend on whether they would select a cat or a dog as a pet? - chi-square test of independence
Chi-square example we need to check the assumptions of The expected frequencies should be greater than 5.
What does it show? - (4)
Here we see that all the expected counts in the cat group and one expected count in the dog group are below 5.
We also have one in the cat group that is below 1.
So, SPSS has flagged that we have 60% of the expected counts falling below 5.
So assmption of expected frequencies greater than 5 is not assumed
If chi-square assumption that The expected frequencies should be greater than 5 is not satisfied then do - chi-square test of independence
We should use Fisher’s Exact Test which can correct for this.
Does the area of psychology that a person prefers depend on whether they would select a cat or a dog as a pet? - - chi-square test of independence
If assumptions were met (expected frequencies greater than 5) then.. report - (2)
A chi-square independence test was performed to examine whether there was a relationship between their area of studies in psychology and their preference for cats or dogs.
The relationship between these variables was not significant, χ²(4, N = 46) = 1.46, p = .834, so we fail to reject H0.
Are directional hypotheses possible with chi-square?
A.Yes, but only when you have a 2 × 2 design.
B.Yes, but only when there are 12 or more degrees of freedom.
C.Directional hypotheses are never possible with the chi-squared test.
D.Yes, but only when your sample is greater than 200.
A = only when you have 2 variables to compare and can’t do non-directional in chi-square have to use loglinear or goodness of fit tests
Example situations you can do chi-square directional and not possible - (5)
If we are just comparing pet preferences between males and females, we can make a directional hypothesis (2 x 2 – male/female, cats/dogs).
Males prefer cats or females prefer dogs.
However, when we start adding variables to the design it gets complicated.
If we wanted to compare drink preferences at different times of the day for students/lecturers, we couldn’t form a directional hypothesis.
This is because we have 3 main effects and several interactions to consider. We need to use loglinear analyses to do this.
Loglinear analysis is a …. of chi-square
extension
Chi-square only analyses two variables at a time, whilst log-linear models
can determine complex interactions in multidimensional contingency tables with more than two categorical variables.
Loglinear is appropriate when
there’s no clear distinction between response and explanatory variables
think of
Think of chi-square like t-tests (2 groups) and log-linear like ANOVA (more than 2 groups).
Example of RQ, hypothesis and sig level of loglinear - (3)
Research question: Is the new treatment associated with improvements in health in cats and dogs?
Hypotheses:
H0: Treatment, type of animal and improvements are independent of each other.
H1: Treatment, type of animal and improvements are associated with each other.
Significance level: α = .05
Assumptions of log linear - (2)
Independence
Expected counts > 5
Research question: Is the new treatment associated with improvements in health in cats and dogs?
Checking assumption of expected counts - (3):
Here we have 3 things we are comparing: animal (cat/dog), treatment (yes/no) and improvement (yes/no) all of which are categorical.
We look and see that all of the expected counts are above 5.
So met assumption of independence and expected counts
Research question: Is the new treatment associated with improvements in health in cats and dogs?
Here we have 3 things we are comparing: animal (cat/dog), treatment (yes/no) and improvement (yes/no) all of which are categorical.
In loglinear model selection it begins with - (2)
all terms present (all main effects and all possible interactions
main effects: Animal, Treatment and Improvement
interactions: Animal * Treatment, Animal * Improvement, Treatment * Improvement and Treatment* Animal* Improvement
Research question: Is the new treatment associated with improvements in health in cats and dogs?
Here we have 3 things we are comparing: animal (cat/dog), treatment (yes/no) and improvement (yes/no) all of which are categorical.
In loglinear model selection after including all main effects and interactions then it - (4)
Remove a term and compares the new model with the one in which the term was present.
Starts with the highest-order interaction (including max number of variables/categories)
Uses the likelihood ratio to ‘compare’ models below:
If the new model is no worse than the old, then the term is removed and the next highest-order interactions are examined, and so on.
Model selection of loglinear - what does it show?
Research question: Is the new treatment associated with improvements in health in cats and dogs?
Here we have 3 things we are comparing: animal (cat/dog), treatment (yes/no) and improvement (yes/no) all of which are categorical. - (3)
We can see that the model selection worked in a way that it first tried to remove the 3-way interaction.
However, we can see here that it * affected the fit of the model, so it was left in.
Since removing the highest-order interaction made a * difference to the fit of the model, we get a final model that is the saturated model (it contains all main effects and interactions).
Loglinear SPSS K way and Higher order effects what does it show?
Research question: Is the new treatment associated with improvements in health in cats and dogs?
Here we have 3 things we are comparing: animal (cat/dog), treatment (yes/no) and improvement (yes/no) all of which are categorical. - (2)
we are using the likelihood ratio here because that’s how we compare the models to find the best fit
. We see that all main effects and interactions are significantly contributing to explaining the variance in the data
loglinear what does K represent and what does K = 1,2 and 3 represent? - (4)
K represents the level of the terms.
For example, K=1 would be the main effects,
K=2 would be our 2-way interactions and
K=3 is our 3-way interaction.
Loglinear SPSS - what does parameter estimates show?
Research question: Is the new treatment associated with improvements in health in cats and dogs?
Here we have 3 things we are comparing: animal (cat/dog), treatment (yes/no) and improvement (yes/no) all of which are categorical. - (3)
There is a significant three-way interaction between animal, treatment and improvement, as well as two significant two-way interactions between animal and improvement and treatment and improvement (p < .001)
a * 3-way interaction between animal, treatment and improvement as well as two * 2-way interaction between animal/improvement and treatment/improvement.
Like our post-hoc tests, this is telling us where the * differences are.
Loglinear after seeing statistical tests we go to raw data showing that…
Based on the raw data, there seems to be indication that the cats responded better to treatment than dogs, this should be followed up by chi-square tests separately for cats and dogs to determine whether the association between treatment and improvement is present in both cats and dogs
When conducting a loglinear analysis, if our model is a good fit of the data then the goodness-of-fit statistic for the final model should be:
A. Significant (p should be smaller than .05)
B. Non-significant (p should be bigger than .05)
C. Less than 5 but greater than 1
D. Greater than 5
B
The goodness of fit tests in log linear tests
hypothesis that frequencies predicted by model (expected frequencies) are sig different from actual frequencies in data (obsevered)
A significant goodness of fit result mean
our model was significantly different from our data (i.e., the model is a bad fit to the data).
A recent story in the media has claimed that women who eat breakfast every day are more likely to have boy babies than girl babies. Imagine you conducted a study to investigate this in women from two different age groups (18–30 and 31–43 years).
Looking at the output tables below, which of the following sentences best describes the results? = chi-square
A. Women who ate breakfast were significantly more likely to give birth to baby boys than girls.
B. There was a significant two-way interaction between eating breakfast and age group of the mother.
C. Whether or not a woman eats breakfast significantly affects the gender of her baby at any age.
D. The model is a poor fit of the data.
C
Chi square and log linear are both
non-parametric methods
Non-parametric tests used when
When data violate the assumptions of parametric tests we can sometimes find a nonparametric equivalent
eg. normality of distribution
Non-parametric tests work on the principle of
randomization or ranking the data for each group
Ranking data gets rid of in non parametric
outliers and skew
How does ranking work in non-parametric? - (2)
Add up the ranks for the two groups and take the lowest of these sums to be our test statistic
The analysis is carried out on the ranks rather than the
actual data.
Non-parametric equivalent of independent/unrelated t-tests
Mann-Whitney or Wilcoxon
rank-sum test
Non-parametric equivalent of repeated t-test
Wilcoxon signed-rank test
Non-parametric equivalent of : One-way independent (between-subjects) ANOVA
Kruskall-Wallis or (for trends)
Jonckheere-Terpstra
Non-parametric equivalent of one-way repeated ANOVA
Friedmanʼs ANOVA
Non-parametric equivalent of Multi-way between or
within-subjects ANOVA
Loglinear analysis (categorical
outcome, with participants as a factor)
Non-parametric equivalent of correlation
Spearman’s Rho or Kendall’s Tau
Mann-Whitney/Wilcoxon rank-sum Test - Compares
two independent groups of scores
Wilcoxon signed rank Test - Compare
two dependent groups of scores
Kruskal-Wallis Test - Compares
> 2 independent groups of scores
Friedman’s Test - Compares
> 2 dependent groups of scores
Spearman’s Rho & Kendall’s Tau - Measures the extent to which
two continuous variables are related (pattern of responses across variables)
Logic behind Wilcoxon’s rank sum test, what does SPSS do? - (3)
Step 1: Get some not normally distributed data
Step 2: Rank it (regardless of group)
Step 3: Significance testing
Does one of the groups have more of the higher ranking scores than the other?
What is DF of chi-square?
(r-1)(c-1)
The likelhood ratio in loglinear model preferred
small sample sizes
DF of likelhood ratio in loglinear
df = (r-1)(c-1)
Decision tree of Mann Whitney - (4)
1 DV = Ordinal (e.g., high school, bachelors, order is meaningful) or continous
1 IV = Categorical and 2 levels
Different partiicpants
Does not meet assumption of parametric
Wilcoxon rank sum and Man Whitney U is
same procedure and used to compare two independent groups and assess whether samples come from same distribution
For Mann-Whitney U/Wilcoxon Rank Sum they comparing 2 independent conditions the two steps - (2)
Rank all the data on the the basis of the scores irrespective of the group
compute the sum of ranks of each group
For wilcoxon rank sum, the statistic Ws is
the lower of the two sums of ranks
For Mann-Whitney, the statistic U use the
sum of ranks for group 1, R1, as follows
Example of table where comparing 2 independent conditions of Wilcoxon rank sum or Mann Whitney U test
Here we have data for two groups; one taking alcohol, the other ecstasy. The scores for a measure of depression. Scores were obtained on two days; Sunday and Wednesday. The drugs were administered on Saturday.
Example of table where comparing 2 independent conditions of Wilcoxon rank sum or Mann Whitney U test
Here we have data for two groups; one taking alcohol, the other ecstasy. The scores for a measure of depression. Scores were obtained on two days; Sunday and Wednesday. The drugs were administered on Saturday.
Two steps for both statistics:
Rank all the data on the the basis of the scores irrespective of the group
compute the sum of ranks of each group - (5)
The graphic here shows how we can list the scores in order and as a result assign each score a rank.
When scores tie, we give them the average of the ranks.
If we ensure we keep track of the group the scores came from we can relatively easily add the ranks up for each group.
Note, that if there was little difference between the groups the sums of their ranks would be similar, as they are for the data shown her for Sunday.
However, the sum of ranks differ considerably for the data obtained on Wednesday.
For Wilcoxon sum of ranks = comparing 2 independent groups the W s statistic,
the group sizes of n1 and n2 the mean of W2 is given:
For Wilcoxon sum of ranks = comparing 2 independent groups the W s statistic,
the standard error of Ws is given
For Wilcoxon sum of ranks = comparing 2 independent groups the z score of Ws can be calculated
For Mann Whitney , the statistic U use the the sum of ranks for group 1, R1, as follows
For Mann Whitney , the statistic U use the the sum of ranks for group 1, R1, as follows
Specificy equation - (2)
The first terms involving n1 and n2 actually compute the maximum possible sum of ranks for group 1.
U is zero when all those in group one have scores that exceed the scores of those in group 2.
In Mann Whitney U there is a standardised test statistic which is z score that can allow you to compute
effect size so r = z / square root of N (number of pps
What is decision tree of Wilcoxon signed rank test? - (4)
1 IV categorical with 2 levels
Same participants in each predictor level
1 DV - Ordinal or continous
Does not meet assumption of parametric tests
Steps of Wilcoxon signed rank test - (4)
- Compute the difference between scores for the two conditions
- Note the sign of the difference (positive or negative)
- Rank the differences ignoring the sign and also exclude any zero differences from the ranking
- Sum the ranks for positive and negative ranks
Example of Wilcoxon signed rank test carrying out steps - (9)
The table shown here has the Depression Scores taken on Sunday and Wednesday for those taking ecstasy on Saturday.
Data for Sunday are in the first column and Wednesday in the second column.
The third column shows the difference between scores obtained on Sunday and Wednesday.
NOte some could be negative, some positive. In this example however the difference is always positive apart from two values when the difference is zero.
The fourth column notes the sign of the difference or notes it is going to be excluded because the difference was zero.
The fifth column ranks the differences in terms of their size, but not sign.
The sixth and seventh column list the ranks that were for positive and negative differences, respectively.
It is these two columns that are summed to get the relevant statistics, called T+ and T-.
Because T+ and T- are not independent, we take only the T+ value.
For Wilcoxon signed rank test the group size n the mean T is given
For Wilcoxon signed rank test thestandard error fo T is given:
For Wilcoxon signed rank test compute z score of T by
Kruskal Wallis decision tree like one-way independent ANOVA - (4)
1 DV of continous or ordinal
1 IV categorical predictor of more than 2 levels
Diff participants in each predictor level
Does not meet assumption of parametric
Kruskal Wallis steps - (2)
Rank all the data on the the basis of the scores irrespective of the group
Compute the sum of ranks of each group, Ri , where i is the group number
For Kruskal-Wallis, the statistic H is as follows
What is decision tree of Friedman test? - (4)
1 DV continous or ordinal
1 IV predictor categorical with more than 2 levels
Same participants in each predictor level
Doesnot meet assumption of parametric tests
What is steps of Friedman test? - (2)
Rank the scores or each individual - that means you will have ranks varying from 1 to the number of conditions the participants took part in
Compute the sum of ranks, Ri , for each condition
For Friedman, the statistic F is as follows
K = conditions
N = number of pps
Example when using chi-square test - (3)
- In this example, they wanted to look at whether attendance at lectures had an impact on their exam performance on whether they passed or failed
- Attendence was coded as 1 if participants generally attended lectures , barirng illness, and 2 if they did not attend
- Exam was scored as 1 = Pass and 2 = Fail
- In this example, they wanted to look at whether attendance at lectures had an impact on their exam performance on whether they passed or failed - chi square
What does it show? - (4)
- Attendence, Attended Lectures , Count = this is people who attended lecture and number of people who passed was 84 and people who failed was 29
- % Within attendence give same info so 74.3% passed and 25.7% failed when attended lectures
-Going to didn’t attend lectures, 22 people passed and 35 failed and below is in percentages: - Easier using percentages writing up
- In this example, they wanted to look at whether attendance at lectures had an impact on their exam performance on whether they passed or failed - chi square
What does it show? - (2)
- At top row, pearson chi-squared which chi-square statistic which was 20.617, DF which is 1 and p-value was 0.000
- 0 cells have a count less than 5 met assumption of chi-square test that expected counts greater than 5
- DF is always … in two-by-two chi-square
1
If SPSS output shows below in chi-square that
0 cells have a count less than 5 met assumption of chi-square test
- In this example, they wanted to look at whether attendance at lectures had an impact on their exam performance on whether they passed or failed - chi square
What does it this effect size show? - (2)
- x^2 (1) = 20.62, p < 0.001
- Cramer’s V = 0.35 , indicating a medium effect size
Effect size guideline of r correlation coefficient - (3)
Small effect = 0.1
Medium effect = 0.3
Large = 0.5 and above
Cramer’s V can be interpreted similar to
correlation coefficient:
In chi-square we can calculate odds
ratio
Example of calculating odds ratio for chi-square - (3)
odds of passing/failing for students who attended lecture = no. of students who attended and passed (84) / no. of students who attedned and failed (29) = 2.897
odds of passing/failing for students who did not attend = no. of students who did not attend lectures and passed (22) / no of students who did not attended and fail (35)
Odds ratio = odds of P/F of attended/ odds of P/F of not attended = 2.897/ 0.629 = 4.606 saying for an individual who attended lectures lead them to be more likely to pass exam
Example research scenario of Mann Whitney - (4)
- Independent sample design
- One IV, two conditions = existing vs new medication
- One DV (symptoms) but this time on ordinal (scale from 1 to 5) and got combination of non-normally distributed data and small sample size (very problematic for t-tests)
- Mann Whitney U Test
Example of using Mann Whitney U = skew
What does this Mann Whitney U show?
- Independent sample design
- One IV, two conditions = existing vs new medication
- One DV (symptoms) but this time on ordinal (scale from 1 to 5) and got combination of non-normally distributed data and small sample size (very problematic for t-tests)
- Mann Whitney U Test
- This box summarises the p-value ( p = 0.026) and tells you whether to accept or reject the null hypothesis.
What does this output show of Mann Whitney U? - (3)
- Mann Whitney U test statistic is 166.000 to report and also people report standardised test statistic 2.292 which is z score so handy to report as know if its above +/-1.96 then p-value we get out of test is significant
- P-value of exact significant is p = 0.026
- This is significant difference between the 2 groups
- Next we would want to look at the median scores to see which group is scoring highest and lowest after sig Mann Whitney U test
What does this output show? - (3)
- Independent sample design
- One IV, two conditions = existing vs new medication
- One DV (symptoms) but this time on ordinal (scale from 1 to 5) and got combination of non-normally distributed data and small sample size (very problematic for t-tests)
- Mann Whitney U Test
- For existing treatment, median score was 3.
And new treatment the median score was 4.
It suggests new treatment was more effective in reducing symptons than the existing treatment
Example research scenario of Friedman ANOVA - (5)
- Again we got ordinal data for DV not sure distances between levels is going to be the same
- Related design
- One IV, 3 conditions
- One DV (level reached to video game)
- Friedman’s ANOVA = more than 2 groups in related design
What does this Friedman ANOVA output show?
- We got total sample size which is 30 and test statistic which is 21.788, DF = 2 and p value is 0.000 so significant difference between the 3 groups
For Friedman’s ANOVA we do
post hoc tests for pairwise comparisons to look where the differences are
What do this Friedman ANOVA test post hoc tests show? - (7)
- First one is Joy stick vs Vyper Max
- Second one is Joystick vs Evo Pro etc…
- Notice it gives two p-values of sig and adjusted sig
- Adjusted sig control for multiple comparison and make correcitons to p-value (use this
- Difference between joystick vs Vyper Max was sig at p = 0.005
- Difference between Joystick vs Evo Pro was sig at p = 0.00
- Difference between VyperMax vs Evopro is non-significant as p = 0.660
- The problem with non-parametric tests is that they have less power
to detect sig effects compared to parametric effects so maybe issue of dealing with power so may have median scores higher in one then another but not sig
Non-parametric tests are used when
A. The assumptions of parametric tests have not been met.
B. You want to increase the power of your experiment.
C. You have more than the maximum number of tied scores in your data set.
D. All of these.
A = non parametric have fewer assumptions than parametric
With 2 2 contingency tables (i.e., two categorical variables both with two categories) no expected values should be below ____.
A. 5
B. 1
C. 0.8
D. 10
A
Which of the following statements about the chi-square test is false?
A. Which of the following statements about the chi-square test is false?
B. The chi-square test can be used to check how well a model fits the data.
C. The chi-square test is used to quantify the relationship between two categorical variables.
D. The chi-square test is based on the idea of comparing the frequencies you observe in certain categories to the frequencies you might expect to get in those categories by chance.
A = correct, because it is false. Chi-square can be used on categorical variables only
When conducting a loglinear analysis, if our model is a good fit of the data then the goodness-of-fit statistic for the final model should be:
Hint: The goodness-of-fit test tests the hypothesis that the frequencies predicted by the model (the expected frequencies) are significantly different from the actual frequencies in our data (the observed frequencies).)
A. Non-significant (p should be bigger than .05)
B. Significant (p should be smaller than .05)
C.Greater than 5
D. Less than 5 but greater than 1
A = If our model is a good fit of the data then the observed and expected frequencies should be very similar (i.e., not significantly different
What is the parametric equivalent of the Wilcoxon signed-rank test?
A. The paired samples t-test
B. The independent t-test
C. Independent ANOVA
D. Pearson’s r correlation
A
Are directional hypotheses possible with chi-square?
A. Yes, but only when you have a 2 × 2 design.
B. Yes, but only when there are 12 or more degrees of freedom.
C. Directional hypotheses are never possible with the chi-squared test.
D. Yes, but only when your sample is greater than 200.
A =
A psychologist was interested in whether there was a gender difference in the use of email. She hypothesized that because women are generally better communicators than men, they would spend longer using email than their male counterparts. To test this hypothesis, the researcher sat by the computers in her research methods laboratory and when someone started using email, she noted whether they were male or female and then timed how long they spent using email (in minutes). How should she analyse the differences in males and females (use the output below to help you decide)?
A. Mann–Whitney test
B. Paired t-test
C.Wilcoxon signed-rank test
D. Independent t-test
What is the Jonckheere–Terpstra test used for?
A. To test for an ordered pattern to the medians of the groups you’re comparing.
B. To test whether the variances in your data set are approximately equal.
C. To test for an ordered pattern to the means of the groups you’re comparing.
D.To control for the familywise error rate.
A
f the standard deviation of a distribution is 5, what is its variance?
25 = 5^2
If standard deviation of distribution is 5, what is its variance?
5^2 = 25
A distribution with positive kurtosis (leptokurtic) indicates that:
A Scores are tightly clustered around the centre of the distribution
B Scores are spread widely across the distribution
C Scores are clustered towards the left side of the distribution
D Scores are clustered towards the right side of the distribution
A
If the scores on a test have a mean of 28 and a standard deviation of 3, what is the z-score for a score of 34?
A 3
B 2
C -2
D -3.42
B = 34-28/3
Question 4
Which of the following is an assumption of a one-way repeated measures ANOVA but not a one-way independent ANOVA?
A Homogeneity of variance
B Homogeneity of regression slopes
C Sphericity
D Multicollinearity
C
A test statistic with an associated p value of p = .002 tells you that:
A The statistical power of your test is large
B The probability of getting this result by chance is 0.2%, assuming the null hypothesis is correct
C The effect size of this finding is large
D All of the above
B
Question 6
Of the following, which is the most appropriate reason to use a non-parametric test?
A When the DV is measured on an ordinal scale
B When you have unequal sample sizes between conditions of the IV
C When the sample size is small
D When you have a violation of the assumption of homogeneity of variance
A
Question 7
The following are all commonly stated assumptions/requirements for using ANOVA. Which of the 4 is the only one that the procedure always requires?
A Subjects are assigned to treatment conditions / groups using random allocation
B Data is from a normally distributed population
C DV is continuous (interval or ratio)
D Variance in each experimental condition is similar (assumption of homogeneity of variance)
C
Question 8
A researcher runs a single t test and obtains a p value of p = .04. The researcher rejects the null hypothesis and concludes that there is a significant effect of the experimental manipulation in the population. Which of the following are possible?
A The researcher may have made a type 1 error
B The researcher may have made a type 2 error
C The researcher may have made a familywise error
D All of the above are possible
A
Question 9
99% of z-scores lie between:
A 1.96
B 2.58
C 3.29
D 1
B
Question 10
If predictor X shows a correlation coefficient of -.45 with outcome Y, we can confidently say that:
A X is a significant predictor of Y
B Variance in X accounts for 20.25% (that’s -.45²) of the variance in Y
C X has a causal relationship with Y
D All of the above
B
Question 11
How much variance has been explained by a correlation of r = .50?
A 10%
B 25%
C 50%
D 70%
B = 0.50 squared
Question 12
The relationship between two variables partialling out the effect that a third variable has on both of those variables can be expressed using a:
A Bivariate correlation
B Semi-partial correlation
C Point-biserial correlation
D Partial correlation
D
Question 13
A regression model in which variables are entered into the model on the basis of a mathematical criterion is known as a:
A Forced entry regression
B Hierarchical regression
C Stepwise regression
D Logistic regression
C
Question 14
In the following regression equation, what does the parameter b_0 indicate?
A The predicted value of the outcome variable
B The regression slope
C The intercept
D Error variance
C
Question 15
In multiple regression, a high VIF statistic, a low tolerance statistic, and substantial correlations between predictor variables, ALL indicate:
A Multicollinearity
B Heteroscedasticity
C The presence of outliers
D Non-normality of the residuals
A
Question 16
In a multiple regression model, the t test statistic can be used to test:
A Differences between group means
B The significance of the overall model
C The significance of the regression coefficients for each predictor
D The t test statistic is not used in multiple regression
C
Question 17
A Mixed ANOVA design would be appropriate for which of the following situations?
A Different participants are tested in each condition
B All participants are tested in all conditions
C Participants are tested in all conditions for at least one IV, and different participants are tested in each condition for at least one IV
D None of the above
C
Question 18
In a one-way independent ANOVA with 40 participants and 5 conditions of the IV, what are the degrees of freedom for the between-groups Mean Squares (MSbetween)?
A 4
B 5
C 35
D 40
A = k (number of grps) - 1 = 5 - 1 = 4
Question 19
In a two-way ANOVA there are:
A Two IVs and two DVs
B Two IVs and one DV
C One IV and two DVs
D None of the above
B
Question 20
In a two-way factorial design, the SSR (residual sum of squares) consists of:
A Variance due to the independent variables and their interaction
B Variance due to the independent variables, dependent variable(s) and error variance
C Variance accounted for by the interaction only
D Variance which cannot be explained by the independent variables
D
Question 30
Statistics enthusiast and Dub Reggae legend ‘Mad Professor’ conducted a study into the effects of listening to music on a memory task. He ended up with three independent variables and one dependent variable, and he wished to analyse all possible main effects and interaction effects. How many model effects in total will he have?
A 1
B 3
C 6
D 7
C = 2^k - 1 = 2^3 - 1 = 7
Question 24
A nutritionist was interested in the effectiveness of two of the latest fad diets. The nutritionist took 30 people who wanted to lose weight and allocated them to either the SuperScienceMaxPro weight loss regime, or the SensiNutriPlus diet. He recorded their weight at 4 time points. (The start of the diet, and then every month after that for 3 months). In addition, the nutritionist was interested in whether males and females would differ in recorded weight loss over the 4 time points. What is the design of this study?
A Two factorial with one independent factor and one repeated measures factor
B Three-factorial with two independent factors and one repeated measures factor
C Three-factorial with one independent and two repeated measures factors
D Four-factorial with two independent factors and 4 repeated measures factors
B
Question 26
What is the non-parametric equivalent of a one-way repeated measures ANOVA?
A Wilcoxon sign test
B Mann-Whitney U test
C Kruskall-Wallis test
D Friedman test
D
Question 27
What is a limitation of the Chi-square test?
A It cannot be used when you have more than 2 categorical variables
B Directional hypotheses are not possible when you have more than two conditions of a variable
C A small sample size can result in an unreliable test statistic
D All of the above
D
Distribution of z