unit 3 - ch 12 - Distribution & 1-Way ANOVA Flashcards
Analysis of variance
1 sample: Ho mew = #
2 sample Ho mew1 = mew2
anova
3 or more sample means
Numerator of the variance is the basis of comparison
You can compare 2 at a time but we don’t want to because it inflates alpha
Compare water to tequila
Ho mewW = mewT
a = 0.05 → 5%
When you get down to all samples it turns a to a = 0.265 → 26.5%
Inflates alpha and increases chance of getting type 1 error
when does alpha inflate
You can compare 2 at a time but we don’t want to because it inflates alpha
Compare water to tequila
Ho mewW = mewT
a = 0.05 → 5%
When you get down to all samples it turns a to a = 0.265 → 26.5%
Inflates alpha and increases chance of getting type 1 error
ANOVA primary advantage over 2 multiple sample tests:
ANOVA does not inadvertently inflate alpha
Keep a to 0.05
Assumptions for 1-way ANOVA (test assumptions)
The null is true
At least interval level data
The CLT is satisfied
Random and independent
The variances are equal
Are the variances equal
Exactly 2 samples
2 or more samples
Sample sizes can be unequal
exactly 2 samples
s1 = s2
n1 = n2
TSEV (pooled)
exactly 2 samples
s1 = s2
n1 =/= n2
TSUE (non-pooled)
exactly 2 samples
s1 =/= s2
n1 = n2
TSUE (non-pooled)
exactly 2 samples
s1 =/= s2
n1 =/= n2
TSUE (non-pooled)
3 or more samples
sample sizes
n1 = n2 = nk
1 way anova = even if the variances are substantially unequal
= 4x - 5x
sample sizes
not all n, are =
1 way anova = if the variances are substantially equal
= 1x-2x
anova drink example
type of drink → ?
water, boba, energy, tequila
#s
→ Factor, classification, treatment
anova drink example
type of drink
water, boba, energy, tequila → ?
#s
→Categories
anova drink example
type of drink
water, boba, energy, tequila
#s → ?
→ Criterion variable
Factor →
qualitative (nominal)
Categories →
qualitative (nominal)
Criterion variable →
quantitative (interval or ratio)
data variation - 1. Within (data is varied within water etc.)
Due to chance, randomness, error
data variation - 2. Between (data is varied)
1st vs 3rd column etc
Due to factor, classification, treatment
Type of drink (ex)
Old Example - Wife and husband
Variation between husband and wife group and within wife and within husband
Vertical and horizontal
the big picture
[ look at picture on docs ]
28 data points and create 4 samples combined
the big picture
[ look at picture on docs ]
Variation is in the middle X double bar =
grand mean
the big picture
[ look at picture on docs ]
(red dot) X =
we can measure this to X double bar = measure of variation
the big picture
[ look at picture on docs ]
X to x double bar =
SStotal
Distance between these is two distances
the big picture
[ look at picture on docs ]
X to X bar (owns mean (x to x bar)) =
SSwithin
Sum of the squares within
Remaining distance is how x bar to x double bar
the big picture
[ look at picture on docs ]
X bar to x double bar = SSbetween
Sum of squares between
SSwithin (X to xbar)
Variation due to chance
Sample with a lot of dispersion the length of sum of squares lengthens
Sample with little dispersion the length of sum of the squares within shortens
Square before we sum = 0 (don’t want this so)
SSwithin = sigma(x -xbar)^2
Sum of the squares (SS)
SSbetween (Xbar to X double bar)
Relationships samples have to each other
If 4 sample are spread out then SSbetween lengths
If 4 samples are stacked on top each other then SSbetween will shorten
Relationship between samples control shape
Sigma (xbar - xdouble bar)^2
Test stat =
between term / within term
Divide total variation into these two
Underlying theory of anova
Total variation can be portioned into 2 distinct parts– between and within – and those 2 components can be compared to determine which is affecting the data to a greater degree
wide dispersion
If between is big (num) and within is small (denom) = big test stat so
between increase/ within decrease = TS =
reject the null
Tight dispersion =
between small num/within big num = small TS =
fail to reject null
small numerator
Small distance between
Large within
Chance randomness and error
The HT in excel (with 1-way ANOVA)
Step 1 = run HT
Ho mewW = mewVoba = mewTequila = mewEnergy
Alternative is not this but with =/=
This is WRONG some of these can be equal to each other
H1 = not all population means are equal
OR “at least one population means are equal”
Wtv but H1 is a sentence
The HT in excel (with 1-way ANOVA)
Step 2 = alpha
a = 0.10
The HT in excel (with 1-way ANOVA)
Step 3 = Test Stat - F value
Single factor -> type of drink
Variance
SS
> Between
> Within
> Total
Formula for variance
(X - xbar)^2
Sum of the squares aka numerator of the variance
Count x variance =
ANOVA (sigma(x -xbar)^2)
df, between =
n-1
df, within =
nt-k
df, total =
nt-1
df column is
additive
MS =
ss / df
F =
MSbetween = MSwithin
SS is
additive (as the lines show)
f-table requires
dfbetween and dfwithin
The HT in excel (with 1-way ANOVA)
Step 4 = CV =
P value of F crit
The HT in excel (with 1-way ANOVA)
Step 5 = decision
P < a = REJECT
P > a = FTRN
TS > CV = REJECT
TS < CV = FTRN
One tail-Right tail because the TS positive because everything is always squared (think of formula)
The HT in excel (with 1-way ANOVA)
Step 6 = Summary
Not all population means are equal
Rejected so different drinks impacts people’s ability to get through the levels of the game. Tequila did NOT impair people from progressing through the game; it actually made them better. That is because ___.
Social game or game that rewards risk or rewards aggression
Factor is the reason for variation (type of drink)
FACTOR IS APART OF CONCLUSION NOT RANDOMNESS
Look at means.. Tequila is higher than other means!!!!!!!!
This is how we know the factors are not equal and we may be REJECTING THE NULL
Student EX: How big is F for F
B/W = TS
TS is around 1
always positive 1!!!!!
Further away = reject
Close to TS = FTR
Null Ho says that 2 groups compared are equal
If they are equal TS = 1
Num and Denom can be big or small
F distribution is skewed
Df num and Df denom
Steep decline because it can’t go negative but can get larger
“Analysis of Variance” (abbreviated ANOVA)
For hypothesis tests comparing averages among more than two groups, statisticians have developed a method called
variances
The purpose of a one-way ANOVA test is to determine the existence of a statistically significant difference among several group means. The test actually uses variances to help determine if the means are equal or not.
In order to perform a one-way ANOVA test, there are five basic assumptions to be fulfilled:
Each population from which a sample is taken is assumed to be normal.
All samples are randomly selected and independent.
The populations are assumed to have equal standard deviations (or variances).
The factor is a categorical variable.
The response is a numerical variable.
F distribution
The distribution used for the hypothesis test is a new one. It is called the F distribution, invented by George Snedecor but named in honor of Sir Ronald Fisher, an English statistician. The F statistic is a ratio (a fraction). There are two sets of degrees of freedom; one for the numerator and one for the denominator.
To calculate the F ratio, two estimates of the variance are made.
- Variance between samples: An estimate of σ2 that is the variance of the sample means multiplied by n (when the sample sizes are the same.).
- Variance within samples: An estimate of σ2 that is the average of the sample variances (also known as a pooled variance).
SSbetween = the sum of squares that represents the variation among the different samples
SSwithin = the sum of squares that represents the variation within samples that is due to chance.
“sum of squares”
To find a “sum of squares” means to add together squared quantities that, in some cases, may be weighted. We used sum of squares to calculate the sample variance and the sample standard deviation in Chapter 2 Descriptive Statistics.
MS means “mean square.” MSbetween is the variance between groups, and MSwithin is the variance within groups.