Tests Involving Several Poulation Means Flashcards
Tests Involving Several Population Means
Aim: To detect differences among several population means
Hypotheses
H_0: µ_1= µ_2 = µ_3 = µ_4 = . . . = µ_k = 0
i.e. the means are the same. In other words, there is no difference (zero difference) between the means
H_1: µ_1 ≠ µ_2 ≠ µ_3 ≠ µ_4 ≠ . . . ≠ µ_k ≠ 0
i.e. at least one of the equalities does not hold
Thus, we can formulate the hypotheses as:
H_0: There is no difference between the means
H_1 : At least one of the equalities does not hold
Assumptions of ANOVA
?
Sampling from each of the k populations must be independent and random
The population must be normally distributed with mean µ (not necessarily equal) and variance σ^2
Sources of Variation:
ANOVA test is based on the comparison of the amount of variation in each of the k treatments. If variation from one treatment to the next is significantly high, then it can be concluded that the treatments have dissimilar effects on the population. There are three types or sources of variation. They are:
i) Total variation: this refers to the variation among the total of all n observations
(ii) Between sample means variation; which refers to the variation between the various treatments administered to all the observations (population)
(iii) Within sample means, which refers to the variation within any one given treatment (sample)
Note: it is by comparing these different sources of variation that ANOVA can be used to test for the equality of means of different populations. Any difference that the treatments may have will be detected by a comparison of these forms of variation.
The Principle behind ANOVA
To determine if the different treatments have different effects on their respective populations, a comparison is made between the variation between sample means and the variation within samples.
The variation between samples can be caused, in part, by differences in the treatments administered to them. On the other hand, the variation within a given sample is caused by random factors. Variations within samples are independent of the treatments (since all observations within a sample get the same treatment). Thus, within sample variations are results of randomised sampling errors within the sample
ANOVA is used to measure this difference between “variation between samples” and “variation within samples”. It is a ratio of the “variation between samples” to the “variation within samples” and it is based on the F-Ratio
Note: When population means are different, a treatment effect is present and the deviation between samples will be large compared to the error deviation within samples. Thus, the F – value which is the ratio of the treatment variation to the error variation will rise.
The total variation is the sum of the variation caused by different treatments and the variation caused by the random error elements
Once the F test is significant, it means that the means are not the same. We may wish to find out where the differences lie, i.e. the means that are actually different. This is accomplished by conducting a Post Hoc test using Turkey or LSD formula
Turkey formula:
q_α √(MSE/a)
LSD = √((2 MSE F_α)/a)
Where
MSE = Mean square error
α = level of significance
q_α = is obtained from the studentized range distribution table with k and n-k degrees of freedom
F_α = F value at the specified level of significance
Note:
F_calculated = MSB/WSW
What are the formulas
MSB = SSB/df, here df = k-1 MSW = SSW/df, here df = n-k Where MSB = mean square between means MSW = Mean square within means SSB = sum of squares between means SSW = sum of squares within means .df = degrees of freedom
The major task involves the computation of the sum of squares (SS)
Computation of sum of squares
SSB = (∑▒〖Ti .〗^2 )/nj - 〖T. .〗^2/n for row treatment; and
= (∑▒〖T. j〗^2 )/nj - 〖T. .〗^2/n
SSW = ∑_i▒∑_j▒〖X_ij〗^2 - (∑▒〖Ti .〗^2 )/nj for row treatment; and
= ∑_i▒∑_j▒〖X_ij〗^2 - (∑▒〖T. j〗^2 )/nj for column treatment
In simple terms Ssb = sum of column square divided by 4 \+ sum of column square divided by 4 .....+sum of column square divided by 4 Minus sum of sum of columns ———————————- Total number of columns
Ssw = column1 square + column2 square + column 3 square +……
Minus sum of sum of columns square divided by total number of columns
ANOV|A Table
Source of Variation df SS MS Test Ratio
Between Means (Treatment) k- 1 SSB or SSTR MSB or MSTR F = MSB/MSW
Within Means (Error) n-k SSW or SSE MSW or MSE
Total n-1 SST
Msb= ssb/df Msw= ssw/n-k
F= msb/msw