Experimental Design Flashcards
Questions to conduct analysis
- How to arrange data? - Do we conduct one-way ANOVA? - Is the response continuous or discrete - What is the model assumption - What is the distribution for the error, still normal? - Consider blocking factors and interactions
Define Factor
Variable whose influence upon a response variable is being studied in an experiment
Factor Level
Numerical values or settings for a factor
Trial (or run)
application of a treatment to an experimental unit
Treatment or level combination
set of values for all factors in a trial
Experimental unit
Object to which a treatment is applied
Randomization and reasons for doing this
using a chance mechanism to assign treatments to experimental units or run order
Reasons for randomization: Reduces the chance that an unaticipated variable effect will confuse the results of the experiment (protects against unknown variables). Most of these unanticipated effects manifest themselves over time. Also it reduces the biases that an experimenter may impose on a design. Lastly it ensures validity of the estimate of experimental error and provides a basis for inference in analyzing the experiments.
5 categories of Experimental design
- Treatment comparisons
- Variable screening
- Response Surface Modeling
- System optimization
- System robustness
Treatment Comparisons
Purpose is to compare several treatments of a factor
Variable Screening
Have a large number of factros, but only a few are important. Experiment should identify the important few.
Response Surface Exploration
After important factors have been identified, their impact on the system is explored; regression model building
System Optimization
Interested in determining the optimum conditions
System Robustness
Wish to optimize a system and also reduce the impact of uncontrollable (noise) factors. (example: car running well on different road conditions and with different driving habits)
Systematic Approach to experimentation
- State the objective of the study
- Choose the response variable
- Choose factors and levels
- Choose experimental design (plan)
- Perform the experiment
- Analyze the data
- Draw conclusions
Three fundamental principles of experimental design
Replication, Randomization, Local control of error (blocking and covariates)
Define replication, difference between it and repetition
Each treatment is applied to units that are representative of the population. This helps to reduce variance and increase power to detect significant differences.
Repetition would be the repetition of a measurement any number of times on one unit. Replication is replicating the measurement process with a new unit
Define Randomization and list its advantages
Use of a chance mechanism to assign teratments to units or to run order.
It has the following advantages:
- protects against latent variables or “lurking” variables
- Reduces influence of subjective bias in treatment assignments
- ensures validity of statistical inference
Define blocking, notes on effective blocking strategies
A block refers to a collection of homogeneous units (example: hours, batches, lots, etc) .
Effective blocking: larger between-block variations than within-block
“block what you can and randomize what you cannot”
Run and compare treatments within the same blocks. Use randomization within blocks to eliminate block-block variation and reduce variability of treatment effects on estimates
Define learning effect
Advantage given to the unit/person in an experiment. Mitigated with balanced randomization
Define balanced randomization
To randomly choose or go through randomization so that equal numbers of treatments to units are sustained
Two things that scientific method require
- Data collection
- Data analysis
Three basic methods of collecting data and explanations
- Retrospective studies (historical data)- least expensive and quickest way, data is readily available, data mining, but less than optimal for research goals/ questionable reliability
- Observational studies- uses observational studies to monitor processes (beware the heisenberg uncertainty principle), employs simple random sampling(most common)/ stratified random sampling/ systematic sampling
- Designed experiments- intentionally disturb the process and observe the results, manipulate factors reach equilibrium and observe response
Difference between experimental and observational unit
The experimental unit is the smallest unit to which we apply a treatment combination.
The observational unit is the unit upon which we make the measurement. (May or may not be the experimental unit)
Define experimental error and observational error
Experimental error measures the variability among the experimental units. May be thought of as background noise, represents variability from trying to repeat the application of the specific combination of the factor levels
Observational error measures the variability due to the observational units. Is part of the experimental errror but only a part. (Think of baking pies in two different ovens)
Basic idea of local control of error
Reduce the random error among the experimental units. Control or account for anything which might affect the response other than the factors.
OLS Estimation for simple linear regression (the model, what to minimize, and the different values and their variances)
R2 formulas
RegrSS/CTSS = 1 - (RSS/CTSS)
Another way to express RegrSS
What is another way to express MSE?
RSS/ (n-p)
sig hat
SE(B1 hat)
sqrt (MSE/ Sxx)
What does det[(x’x)-1] represent?
It is proportional to the reciprocal of the volume of the confidence ellipsoid for the estimated coefficients
Principle of Parsimony
Occam’s razor: “entities should not be multiplied beyond necessity”. So choose fewer variables with sufficient explanatory power. This is a desirable modeling strategy.
Explain a One-way layout design in words
A single-factor experiment with k levels (treatments)
Linear model for the one-way layout
ANOVA for one-way layout estimated model
Describe over-paramterization
When there are k types of observations but regression parameters are greater than k. When fitting the model, (X’X)-1 will not exist because it is not full rank since X’X is singular. Constraints will be needed to make X’X a nonsingular matrix.
What are the two types of constraints for an over-parameterized model?
1) Allowing the sum of the treatments be equal to zero (zero sum)
2) Allowing one of the treatments to be zero (dropping it from the model matrix, X, called a baseline constraint)
Descibe the purpose of multiple comparisons test and describe the two methods associated with it
After a global F-test of the treatments and rejecting the null hypothesis, the multiple comparisons test identifies which pairs of treatments are statistically significant.
1) Bonferroni Method- alpha/2 level for the t test of N-k degrees of freedom is divided by k’ = kchoose2
2) Tukey Method- measurement of statistical difference between treatments if t critical value exceeds the upper alpha quantile of the studentized range distribution with k and N-k degrees of freedom
Process to find B for linear and quadratic effects
P1(x) and P2(x) and equation for finding y with orthogonal polynomials
Explanation of one-way random effects model
Basically a one-way fixed effects model but now the operators considered come from a pool (population) of operators. This gives rise to the variance components, variance between the operators and within the operators
Do you do multiple comparison tests with a random effects model? Why or why not?
No because of the two error terms. You should find the expected value of these two terms
What is the variance of the average in a random effects model?
MSTr/(nk)
Before using hypothesis testing and confidence intervals, what model assumptions must be made?
1) Have all important effects been captured?
2) Are the errors independent and normally distributed?
3) Do the errors have constant variance?