Chpt 13 Flashcards
What does ANOVA stand for
Analysis of Variance
what is the F- ratio
is the ratio of 2 variables
What does ANOVA allow us to do?
allows us to compare multiple pops and even subgroups of these pops
- how two groups interact with each other quantitatively
What question does ANOVA help us answer
do all 3 means come form a common population
- we are not asking if they were exactly equal. we are asking if each mean likely came from the larger overall population
What is the Null hypothesis for ANOVA
HO= M1 = M2 = M3
What is the problem with using pairwise comparison for 3 pop means
the type I error will compound with each t-test
95% confidence = (.95)(.95)(.95) = .857
so, a (or critical value) would be come 1 - .857 =
143
Type 1 error rate went from 5% (0.05) to 14.3%
what is partitioning
separating total variance into its component parts
- we do this by using ANOVA
What is the variability between the means
distance from overall mean
if the variability between the means (distance from overall mean) in the numerator is relatively Large compared to the variance within the samples the ratio will be
much larger than 1
- the samples mostly likely do NOT come from a common pop
- reject Ho that means are equal
What is the variability within the samples called
internal spread (the denominator)
If the F ratio is similar/similar what does this tell us
- Fail to reject Ho
- means are fairly close to overall mean and/ or distributions overlap a bit
If the F ratio is Small /Large
- Fail to reject Ho
- the means are very close to overall mean and/or distributions melt together
What is the formula for f ratio
B/W / W/in or Among / Around
variance b/w + Variance w/in (error variance) =
Total variance
Factor definition
independent variable (ie. assembly method)
What are the required assumptions for ANOVA
- normally distributed
- distributions must be independent
- the variance of the response variable (Qsquared) is the same for all pops
What are the steps to ANOVA
- Calculate sample mean for each pop
- calculate overall mean for all pops (add up all means / # of means)
- Estimate the variance (Xbar 1- Overall mean)squared / n-1
- compute the sum of squares b/w treatments
- computer mean squares b/w treatments
- calculate sum of squares due to error
- calculate the mean squares due to error
- Setup the ANova table
- calculate f-ratio and p-value
what is SSTR stand for
sum of squares b/w treatments
what is MSTR
mean square b/w treatments
what is SSE
sum of squares due to error
What is MSE
mean square due to error
what is the formula for SSTR
sum (# of sample)(pop1 mean - overall mean)sqaured (do for each set of pops)
What is the formula for MSTR
SSTR/k-1
What is the formula for SSE
(# of samples)(Variance of pop1) + (# of samples) (Variance of pop2) + (# of samples)(Variance of Pop3)
What is the formula for MSE
SSE/nr-k
What is the F-ratio formula
MSTR/MSE
what is k-1
3 pops - 1 = degrees of freedom
What is nr-k
total # of sample for all 3 pops (ex. each contain 5 samples than nr = 5x3
k = total pops (in this case 3)
so df = 15 - 3
What is Fishers LSD
remember ANOVA tells us if at least 2 of the groups are different from each other
- Fisher’s LSD tests 2 specific groups against each other
what does LSD stand for
LEast Significant Difference
What is the formula for Fisher’s LSD
t a/2 x square root of MSE (1/n1 + 1/n2)
what is t a/2
critical value using within degrees of freedom and alpha / 2
What do you compare LSD to
(xbari - xbarj)
- reject if (xbar i - xbarj) is greater than or equal to LSD
- do this for each group
LSD is used to determine
where the differences occur
what is the null hypothesis for LSD
HO: Mi = Mj
what is the test statistic for LSD
t = (xbar i - xbarj) / square root of MSE (1/ni + 1/nj)
What is the rejection rule for LSD - pvalue approach
reject HO if Pvalue is less than or equal to a (CV)
what is the rejection rule of LSD - cv approach
reject HO if t is less than or equal to - t a/2 or
t is greater than or equal to t a/2
what is the rejection rule of LSD - cv approach
reject HO if t is less than or equal to - t a/2 or
t is greater than or equal to t a/2
what is the degrees of freedom for LSD and t- distribution
t a/2 is based on a t-distribution with nT-k degrees of freedom
what is T???
LSD and Confidence intervals - if the confidence interval includes the value 0
we cannot reject Ho, that the pop means are equal
If the LSD confidence interval does not include the value 0
we can conclude there is a difference in pop means
- do not reject Ho
what is a Comparisonwise Type1 Error rate
indicate the level of significance associated with a single pairwise comparison a = 1-.95 = 0.05
What is a experimentwise type 1 error rate
Prob we will not make a type 1 error for all 3 tests
.95)(.95)(.95
- this gets larger the more groups you have
What is the experimentwise type 1 erorr rate denoted as
aEW
What is the experimentwise type 1 erorr rate denoted as
aEW
How do we control the overall experimentwise error rate
use Bonferroni Adjustment
What is the Bonferroni Adjustment
we use smaller comparisonwise error rate for each test
What is the formula for Bonferroni adj
aEW / C (C to test c pairwise comparisons)
ex.
a = 0.05 / 3 pops = 0.017
What are some other procedures we could use to control the overall experimentwise error rate
- Tukey’s procedure
2. Duncan’s multiple range test
When is randomized block design used
useful when the experimental units are homogenous
What do we use if exeperimental units are heterogenoeous
Blocking is often used to form homogenous groups
Problem with Randomized block design? (double check this is what it is referring to)
can arise whenever differences due to extraneous factors (ones not considered in the experiment) cause the MSE term to become too LARGE
- this can cause the f-value to be small, signaling no difference among treatment means when in fact a difference exists
HOw do you compute f-ratio for randomized block design
F = MSTR/MSE
In our example what would the workstation be
the factor of interest
in our example of randomized block design what would the controllers be
the blocks
what would the treatments be in a randomized block design
the pops
- 3 treatments (or pops) associated with workstation factor correspond to the 3 workstation alternatives
what would the treatments be in a randomized block design
the pops
- 3 treatments (or pops) associated with workstation factor correspond to the 3 workstation alternatives
What is the randomized aspect
is the random order in which the treatments (systems) are assigned to controllers
- 6 controllers were selected at random and assigned to operate each of the systems
- a follow up interview and a medical exam of each controlelr in the study provided a measure of stress for each controller on each system
What is SST = for randomzied block design
SST = SSTR + SSBL + SSE
What does k represent in randomized block design
the # of treatments
What does b represent in randomized block design
of blocks
What does nT represent in randomzied block design
total sample size (nT = kb)
What are the steps in randomized block design
- compute SST (total sum of squares)
- Compute SSTR (Sum of squares due to treatments)
- Compute SSBL Sum of Squares due to blocks
- Compute SSE (sum of squares due to error)
What is the formula in randomized block design for SSE
SSE = SST - SSTR - SSBL
What is the formula in randomized block design for SST
sum (Xbar - total block mean) squared
What is the formula in randomized block design for SSTR
(# in sample){sum (treatment mean - block mean)squared
What is the formula for SSBL
(# of pops) [(block mean - total block mean) squared]
What does SSBL mean
sum of squares due to blocks
What is SST
Total sum of squares
what is the degrees of freedom for SSTR
k- 1 ( # of pops - 1)
what is the degrees of freedom for SSBL in randomized block design
b-1 (# of blocks -1)
What is the degrees of freedom for SSE in randomized block design
(k-1)(b-1)
What is the degrees of freedom for SST in randomized block design
nT-1
read notes - i left some out
ready notes i left some out
Describe a factorial experiment
an exerimental design that allows simultaneous conclusions about 2 or more factors
Why use Factorial
used becasue the experimental conditons include all possible combinations of hte factors
Give an example of a Factorial Experiment
study involving (GMAT)
- scores range form 200 to 800
higher scores imply higher aptitude
- to impreove the GMAT scores, consider 3 prep programs
- each program has 3 treatments (the program they are in business, Engineering, Arts)
- second factor - whether a student’s undergrad affects the GMAT score (college)
What would be if we have 3 treatments (prep programs for GMAT) combinations in factorial design if we have 2 factors
factor 1 - the prep program
factor 2 - college attended
3 x 3 = 9 treatment combinations
What is replications
the sample size of 2 for each treatment combination indicates we have 2 replications
What is the formula in Block design for SST
sum (sample - overall mean)sqaured (for all samples)
What is the formula in block design for SSTR
of blocks [(treatment mean - overall treatment mean) sqaured) + (Treatment mean#2 - overall treatment mean) squared) + (treatment mean #3 - overall treatment mean) Squared)
ANOVA Table Deifnition
A table used to summarize the analysis of variance computations and res ults. It contains columns showing the source of variation, the sum of squares, the degrees of freedom, the mean square, and the F value(s)
Blocking Definiton
The process of using the same or similar experimental units for all treat ments. The purpose of blocking is to remove a source of variation from the error term and hence provide a more powerful test for a difference in population or treatment means.
Comparisonwise Type I error rate - Definition
The probability of a Type I error associated with a single pairwise comparison.
Completely randomized design - Definition
An experimental design in which the treatments are randomly assigned to the experimental units.
Experimental units - Definition
The objects of interest in the experiment
Experimentwise Type I error rate - Definition
The probability of making a Type I error on at least one of several pairwise comparisons
Factor - Definition
Another word for the independent variable of interest
Factorial Experiment - Definition
An experimental design that allows simultaneous conclusions about two or more factors
Interaction - Definition
The effect produced when the levels of one factor interact with the levels of another factor in influencing the response variable.
Multiple comparison procedures - Definition
Statistical procedures that can be used to conduct statistical comparisons between pairs of population means
Partitioning - Definition
The process of allocating the total sum of squares and degrees of freedom to the various components.
Randomized block design - Definition
An experimental design employing blocking.
Replications - Definition
The number of times each experimental condition is repeated in an experiment
Response variable - Definition
Another word for the dependent variable of interest.
Single-factor experiment - Definition
An experiment involving only one factor with k populations or treatments
Treatments - Definition
Different levels of a factor
If you have 300 treatments, total is 460 and you have 7 experimental units used for each of the 5 levels of the factor, what is the degrees of freedom for the 300 treatments?
# of samples = 5 levels = 5 n-1 = 5-1 = 4 df = 4
If you have 300 treatments, total is 460 and you have 7 experimental units used for each of the 5 levels of the factor, what is the sum of squares due to error and what is the degrees of freedom
460-300 = 160
df = total samples size - # of samples
= 7x5 =35 - 5 =30
What is ANOVA interested in
the status of the populations that generate the data sets, and not in the data sets themselves
What are some assumptions in ANOVA
- assume that the data in each data set have come form a single pop
- assume that all the pops have the same Q^2
How do we interpret the variation in the data sets for ANOVA
we interpret the variation in each data set as being caused by small and random sources, collectively called error
What is a question regarding ANOVA’s Errors
the question, then, is should we regard the variation in the values across the data sets as “error” or should they be attributed to some other source of variation that is not random (ie different pops)
What is used as the benchmark in ANOVA
variation within the data sets (SSE)
what is used to compare the benchmark to in ANOVA
variation between the data sets (SSTR) is compared
In ANOVA if the between variation is much larger than the within variation, we may conclude what
that the data sets have in fact been generated form different populations
If we assume that the pops have the same Q^2, we can pool these variations and obtain one measure of the
within variation (SSE) - this is the benchmark
MSE is an extension of the concept of what
pooled variance
in the case of two samples, the formula for MSE reduces to what
S^2p
In the case of two samples, we can prove that the F ratio obtained form the ANOVA table equals the
square of the t value obtained from applying the two-sample t test
F=t^2
The F test is a direct extension of the
t test for testing the equality of the population means of several populations, with the assumption that the populations are normally distributed with a common variance
The total sum of squares and the total degrees of freedom are ________. so, any variation and any degrees of freedom left over from the total variation and total degrees of freedom unaccounted for by the “between” source goes to the ______ _______.
fixed
Within source
What does the mean squares column in the ANOVA table give us
the relative effect of each source.
If relatively speaking, the systematic effects are larger than the random effects, this results in a ________F value and the hypothesis of equal means must be -_______
Large F value
rejected
What is the systematic effect
Between Source
What is the nonsystematic or random effects called
within source
What are the 3 assumptions required to use ANOVA
- for each population, the response variable is normally distributed
- The variance of the response variable (Q^2), is the same for all the populations
- THe observations must be independent
If the sample sizes are equal, ANOVA is not sensitive to what
to the departure of the assumption of normally distributed populations
If the sample sizes are equal, ANOVA is not sensitive to what
to the departure of the assumption of normally distributed populations
If the means for the 3 pops are equal, we would expect what
the three sample means to be close together
the closer the 3 sample means are to one another, the
weaker the evidence we have for the conclusion that the pop means differ
If the variability among the sample means is “small”, it supports what
HO
If the variability among the sample means is “Large” it supports what
Ha
The between treatments estimate of Q^2 is based on the assumption that the
null hypothesis is true (Ho is true)
Does the variation within each of the samples have an effect on the conclusion as well
yes
When a simple random sample is selected from each pop, each of the sample variances provides what
un unbiased estimate of Q^2
Why do we call Pooled or within-treatments estimate of Q^2
because each sample variance provides an estimate of Q^2 based only on the variation within each sample, the within-treatments estimate of Q^2 is not affected by whether the pop means are equal
When the samples sizes are equal, the within-treatments estimate of Q^2 can be obtained by computing what
the average of the indivdiual sample variances
Between-Treatments approach provides a good estimate of Q^2 only if what
the null hypothesis is true
If the null hypothesis is false in ANOVA, the between0treatments appaorch does what
overestimates Q^2
The within treatments approach provides what for the Q^2
provides a good estimate of Q^2 in whether the HO is true or the HA
If the null hypothesis is true in ANONVA, the two estimates will be what
similar and their ratio will be close to 1
When do we need to use multiple comparison procedures?
whenever we are performing a series of tests and are concerned with the overall level of significance attached to the whole experiment. When there are several tests, each at some level of significance a, although we still have control over the probability of TYpe I error for individual tests, we have no such control over the series of tests. Multiple comparison helps us out in this regard
What is an example of a multiple comparison procedure for ANOVA
LSD?
If the null hypothesis is true, MSTR and MST provide what
two independent and unbiased estimates of Q^2
The between treatments approach in ANOVA provides what
a good estimate of Q^2 ONLY if HO is true
if the null hypothesis is true, then this estimate and the within treatments estimate will be similar and their ratio will be close to 1
The within treatments approach in ANOVA provides what
a good estimate of Q^2 regardless if HO is true of not
ANOVA is based on the development of two independent estimates of the common what
population variance of Q^2
What are the two independent estimates of variance in ANOVA
- SSTR - B/w treatments
2. SSE - Within Treatments
What are the two independent estimates of variance in ANOVA
- SSTR - B/w treatments
2. SSE - Within Treatments
By comparing SSTR and SSE what can we determine
whether the population means are equal
ANOVA is most used for how many pops
3 or more but can be used for two when testing the means of two pops are equal but doesn’t usually happen (use the x^2 test instead)
How do you calculate the overall mean if the sample sizes are not all the same?
sum of all of the observations / the total # of observations
If H0 is true, MSTR provides
an unbiased estimate of Q^2
if the means of the k populations are not equal, MSTR is
not an unbiased estimate of Q^2
When does MSTR over estimate Q^2
When HO is rejected
What is MSE based on
based on the variation within each of the treatments; it is not influenced by whether the null hypothesis is true.
Is MSE influenced by the null hypothesis HO?
no
if the null hypothesis is false, the value of MSTR/MSE will be_______ because MSTR _________Q^2.
inflated
Overestimates
What is the test statistic in ANOVA
F = MSTR/MSE
What can SST be Partitioned into
Two different sums of squares: SSTR and SSE
SSTR + SSE = SST
If sample sizes are not equal, what must you do for LSD
you must calculate LSD for each one
When the sample sizes are equal, what can you do with LSD
you only need to calculate one LSD
What is fisher’s LSD used for
to determine where differences occur
Why is LSD referred to as a protected or restricted LSD test
because it is employed only used if we first find a significant F value by using ANOVA
In a One-way ANOVA (first part of chpt 13) we focus on test what
the effect of one independent variable
What might a one way ANOVA not do
may not be able to detect differences in means if the differences are caused by another factor than the independent variable we are considering
How can you overcome the limitation of one way ANOVA and testing the effect of the underlying factor is to do what
use the randomized block design
What does the randomized block design allow us to test
the effect of the independent variable as well as the block effect
What does a two way ANOVA allow us to do
test the effect of two or more independent variables and the interaction among these variables
What type of ANOVA do we use if the exeperimental units are homogenous
completely randomized design
What type of ANOVA do we use if the experimental units are heterogenous
blocking is often used to form homogenous groups
What is the purpose of the block design
to control some of the extraneous sources of variation by removing such variation from the MSE term
What does the randomized block design tend to provide
a better estimate of the true error variance and leads to a more powerful hypothesis test in terms of the ability to detect differences among treatment means
Experimental studies in business often involve experimental units that are ____________; as a result, we should use _________________
highly heterogenous
randomized block design
Blocking in experimental design is similar to what
Stratification in sampling
what does nT represent
total sample size
THe experimental design described in block design is a ________design. What does this mean
complete block design
the word complete indicates that each block is subject to all k treatments
That is all controllers (Blocks) were tested with all 3 systems (treatments)
WHat is an incomplete block design
experimental designs in which some but not all treatments are applied to each block - not in this text
what is important to note about the F tests in the block design
we have an F value to test for treatment effects but not for blocks
blocking was used to remove variation from the MSE term
could use MSB/MSE and use the static to test for significance of the blocks
The error degrees of freedom are _______ for a randomized block design than for a completely randomized design because _______
are less
b-1 degrees of freedom are lost for the b blocks
if n is small, the potential effects due to blocks can be
masked because the loss of error degrees of freedom; for large n, the effects are minimized
if n is small, the potential effects due to blocks can be
masked because the loss of error degrees of freedom; for large n, the effects are minimized
If we want to draw conclusions about more than one variable or factor what can we use
factorial experiment
what is factorial experiment
an experimental design that allows simultaneous conclusions about two or more factors
why do we use the term factorial in factorial experiment
because the experimental conditions include all possible combinations of the factors
what does interaction in factorial design mean
refers to a new effect that we can now study because we used a factorial experiment
If the interaction effect has a significant impact on what we studying (ie GMAT), we can conclude what
that the effect of the type of preparation program depends on the under grad college
in two-factor we do an Mean square for
Factor A : MSA = SSA/ a-1
Factor B: MSB = SSB/ b-1
Interaction: MSAB = SSAB / (a-1)(b-1)
Error: MSE = SSE /ab(r-1)
In two-factor we calcualte F for
Factor A: MSA/MSE
Factor B: MSB/MSE
Interaction: MSAB/MSE
define factor
the independent variable of interest
define treatments
different levels of a factor
single-factor experiment - define
an experiment involving only one factor with k populations or treatments
define response variable
another word for the dependent variable of interest
define experimental units
the objects of interest in the experiment
define completely randomized design
an experimental design in which the treatments are randomly assigned to the experimental units
define ANOVA table
a table used to summarize the analysis of varaiance compuations and results
Define Partitioning
the process of allocating the total sum of squares and degrees of freedom to the various components
define Multiple comparison procedures
statistical procedures that can be sued to conduct satistical comparisons between pairs of population means
define comparisonwise TYpe 1 error rate
the provability of a type 1 error associated with single pairwise comparison
define Experimental TYpe I error rate
the probability of making a type 1 error on at least one of several pairwise comparisons
define blocking
the process of using the same or similar experimental units for all treatments.
What is the purpose of blocking
is to remove a source of variation from the error term and hence provide a more powerful test for a difference in population or treatment means
Define randomized block design
an experimental design employing blocking
define factorial experiment
an experiment design that allows simultaneous conclusions about two or more factors
Define replications
the number of times each experimental condition is repeated in an experiment.
Define interaction
the effect produced when the levels of one factor interact with the levels of an other factor in influencing the response variable
What does the two -way anova have the added advantage of
allowing us to study the interaction effect between the variables
With the TWO way ANOVA, when interpreting the results, it is a good idea to focus on what
the interaction effect first.
in a two way ANOVA, if the interaction effect proves to be significant, what do you do
a further detailed analysis can be applied to this aspect
In a two way ANOVA, if the interaction effect provides to insignificant, what can you do
focus can be directed on the main effects
What is a factor in ANOVA
the I.V.
THe factor is also a
variable of interest
A treatment is
different levels of a factor