Lecture 8 - ANOVA Flashcards
What is does ANOVA test?
The difference between two or more population means
What kind of test is the ANOVA? parametric or non-parametric
Parametric test
What are the 4 assumptions of an ANOVA?
1) Random sampling
2) Homoscedasticity (=equal variances)
3) Independent measurements or observations
4) Normal distribution
What is a variable?
a variable is what is measured by experimentalist = response or dependent variable
What is a Factor?
The effect under investigation = independent variable
ie. salinity, temperature, etc.
What are Factor Levels?
different treatment levels in an experiment it is something that the experimenter varies
ie. PCB or temp at various levels
What are the 2 types (and sub-types) of ANOVA’s?
1) Univariate - one variable (response) measured
sub-types: one way (one factor) or multi-way (two or three factors)
2) Multivariate - more than one variable measured
What are two main sources of variation?
1) between sample or population means = factor
2) Within samples or populations = error
What is the variance in ANOVA?
is the difference between the population means high or low
What kind of output do we want with sources of variation in regards to ANOVA?
We want to see a high factor variance between factors and a low error within the samples or populations
When are samples unlikely to come from the same population?
If the variation between the sample means is large relative to the variation within the samples
What does accuracy mean?
Accuracy means we know the true value. However this is not often the case
If the means are almost the same, what happens to the residuals?
The residual becomes zero
What is the ability to detect change in the response?
sensitivity and is related to the number of levels
high sensitivity is detecting change
What happens when sample (level) means are close together?
high internal variability
What is precision related to?
related to the error, and repeatability of the experiment.
What happens when precision is close?
high repeatability
What happens when precision is far?
low repeatability
What happens with the null hypothesis when there is high within variance and low between?
Get closer to accepting the null that the populations come from the same population (there is no real difference between the populations
What happens to the null hypothesis when there is high within variance and higher between variance?
End up mid-way between accepting and rejecting (not significantly accepting or rejecting)
What happens to the null hypothesis when there is low within variance but higher between variance?
Reject the null and the two means/populations are in fact different from each other. No overlap with the between variance means significantly different
Why was the Crakenback river Univariate ANOVA criticized?
The sites are not independent of each other because they flow into each other and the same statistics that were designed for manipulation were used for a monitoring study
What was the result of the Crackenback river sites (factors) not being independent?
Huge bias and pseudoreplication
What is pseudoreplication?
Treating data that is dependent as independent
What should we do with faulty ANOVA stats (improper study such as dependent variables)?
Use descriptive non-parametric statistics
What is a Type I error? How does alpha/critical value relate to this?
Type I error is the possibility of rejecting the true null hypothesis.
Critical value relates to this by the fact that you could be rejecting the null the same percent of the time as the critical value stands for
ie if alpha =0.05 then 5% of the time you could be rejecting the true null
When do you reject the null hypothesis?
When the calculated p-value is less than the critical
or
when the calculated f-value is greater than the critical f (from a table)
When do you significantly reject the null hypothesis?
When the p-value is very close to zero or the f-calc is far greater than the critical
What are the 3 test assumptions for ANOVA regarding a residual diagnostics plot?
1) homogeneity of variance (constant variability: Scale location plot and levenes test
2) Independence: Residual vs.Fitted plot and Durbin-Watson test
3) Normality: Normal Q-Q plot and Shapiro-wilk normality test
When do we see compliance with homogeneity?
When the residuals appear to fluctuate uniformly about zero
ie. uniform dispersion of square root residuals vs. fit
No detectable pattern
When do we not see compliance with homogeneity?
When there is heterogeneity indicated by a tendency for the residuals to fan or funnel
Fan can face either direction
Where is the value of the calculated f when it is greater than the critical f?
It is in the rejection region under the area of the curve (calculus)
this means different populations
In an ANOVA table what are the residuals?
The replicates subtracted by 2?
n # of replicates-2 is the degrees of freedom of the residuals
If the data doesn’t pass the parametric assumption tests, what can we do to continue with testing?
Log transform the data and test again. If the log data passes, you can continue with parametric statistics
What is the TukeyHSD?
Honestly Significant Difference and is a pairwise comparison of all possible combinations of factors to determine which factors (ex. Sites) are no different or are different from each other
What is the equation for determining how many comparisons are made in a pairwise comparison?
k x (k-1)/ 2
How do you interpret a TukeyHSD plot?
If there is 0 difference between the means of the pairs then the confidence interval will encapsulate 0 and they are similar or the same
If the confidence interval does not touch 0 then the means are different. The further from 0 it lies means the closer the p-value is to 0 and rejecting the null and are significantly different.
How do you interpret a TukeyHSD with respect to which sides an interval falls upon?
If the interval falls to the left then there is a positive difference
If it falls to the left then there is a negative difference