Experimental Design Flashcards

Question 1

Q

Questions to conduct analysis

Answer

A

How to arrange data? - Do we conduct one-way ANOVA? - Is the response continuous or discrete - What is the model assumption - What is the distribution for the error, still normal? - Consider blocking factors and interactions

Question 2

Q

Define Factor

Answer

A

Variable whose influence upon a response variable is being studied in an experiment

Question 3

Q

Factor Level

Answer

A

Numerical values or settings for a factor

Question 4

Q

Trial (or run)

Answer

A

application of a treatment to an experimental unit

Question 5

Q

Treatment or level combination

Answer

A

set of values for all factors in a trial

Question 6

Q

Experimental unit

Answer

A

Object to which a treatment is applied

Question 7

Q

Randomization and reasons for doing this

Answer

A

using a chance mechanism to assign treatments to experimental units or run order

Reasons for randomization: Reduces the chance that an unaticipated variable effect will confuse the results of the experiment (protects against unknown variables). Most of these unanticipated effects manifest themselves over time. Also it reduces the biases that an experimenter may impose on a design. Lastly it ensures validity of the estimate of experimental error and provides a basis for inference in analyzing the experiments.

Question 8

Q

5 categories of Experimental design

Answer

A

Treatment comparisons
Variable screening
Response Surface Modeling
System optimization
System robustness

Question 9

Q

Treatment Comparisons

Answer

A

Purpose is to compare several treatments of a factor

Question 10

Q

Variable Screening

Answer

A

Have a large number of factros, but only a few are important. Experiment should identify the important few.

Question 11

Q

Response Surface Exploration

Answer

A

After important factors have been identified, their impact on the system is explored; regression model building

Question 12

Q

System Optimization

Answer

A

Interested in determining the optimum conditions

Question 13

Q

System Robustness

Answer

A

Wish to optimize a system and also reduce the impact of uncontrollable (noise) factors. (example: car running well on different road conditions and with different driving habits)

Question 14

Q

Systematic Approach to experimentation

Answer

A

State the objective of the study
Choose the response variable
Choose factors and levels
Choose experimental design (plan)
Perform the experiment
Analyze the data
Draw conclusions

Question 15

Q

Three fundamental principles of experimental design

Answer

A

Replication, Randomization, Local control of error (blocking and covariates)

Question 16

Q

Define replication, difference between it and repetition

Answer

A

Each treatment is applied to units that are representative of the population. This helps to reduce variance and increase power to detect significant differences.

Repetition would be the repetition of a measurement any number of times on one unit. Replication is replicating the measurement process with a new unit

Question 17

Q

Define Randomization and list its advantages

Answer

A

Use of a chance mechanism to assign teratments to units or to run order.

It has the following advantages:

protects against latent variables or “lurking” variables
Reduces influence of subjective bias in treatment assignments
ensures validity of statistical inference

Question 18

Q

Define blocking, notes on effective blocking strategies

Answer

A

A block refers to a collection of homogeneous units (example: hours, batches, lots, etc) .

Effective blocking: larger between-block variations than within-block

“block what you can and randomize what you cannot”

Run and compare treatments within the same blocks. Use randomization within blocks to eliminate block-block variation and reduce variability of treatment effects on estimates

Question 19

Q

Define learning effect

Answer

A

Advantage given to the unit/person in an experiment. Mitigated with balanced randomization

Question 20

Q

Define balanced randomization

Answer

A

To randomly choose or go through randomization so that equal numbers of treatments to units are sustained

Question 21

Q

Two things that scientific method require

Answer

A

Data collection
Data analysis

Question 22

Q

Three basic methods of collecting data and explanations

Answer

A

Retrospective studies (historical data)- least expensive and quickest way, data is readily available, data mining, but less than optimal for research goals/ questionable reliability
Observational studies- uses observational studies to monitor processes (beware the heisenberg uncertainty principle), employs simple random sampling(most common)/ stratified random sampling/ systematic sampling
Designed experiments- intentionally disturb the process and observe the results, manipulate factors reach equilibrium and observe response

Question 23

Q

Difference between experimental and observational unit

Answer

A

The experimental unit is the smallest unit to which we apply a treatment combination.

The observational unit is the unit upon which we make the measurement. (May or may not be the experimental unit)

Question 24

Q

Define experimental error and observational error

Answer

A

Experimental error measures the variability among the experimental units. May be thought of as background noise, represents variability from trying to repeat the application of the specific combination of the factor levels

Observational error measures the variability due to the observational units. Is part of the experimental errror but only a part. (Think of baking pies in two different ovens)

Question 25

Q

Basic idea of local control of error

Answer

A

Reduce the random error among the experimental units. Control or account for anything which might affect the response other than the factors.

Question 26

Q

OLS Estimation for simple linear regression (the model, what to minimize, and the different values and their variances)

Question 27

Q

R² formulas

Answer

A

RegrSS/CTSS = 1 - (RSS/CTSS)

Question 28

Q

Another way to express RegrSS

Question 29

Q

What is another way to express MSE?

Answer

A

RSS/ (n-p)

sig hat

Question 30

Q

SE(B₁ hat)

Answer

A

sqrt (MSE/ Sxx)

Question 31

Q

What does det[(x’x)^-1] represent?

Answer

A

It is proportional to the reciprocal of the volume of the confidence ellipsoid for the estimated coefficients

Question 32

Q

Principle of Parsimony

Answer

A

Occam’s razor: “entities should not be multiplied beyond necessity”. So choose fewer variables with sufficient explanatory power. This is a desirable modeling strategy.

Question 33

Q

Explain a One-way layout design in words

Answer

A

A single-factor experiment with k levels (treatments)

Question 34

Q

Linear model for the one-way layout

Question 35

Q

ANOVA for one-way layout estimated model

Question 36

Q

Describe over-paramterization

Answer

A

When there are k types of observations but regression parameters are greater than k. When fitting the model, (X’X)^-1will not exist because it is not full rank since X’X is singular. Constraints will be needed to make X’X a nonsingular matrix.

Question 37

Q

What are the two types of constraints for an over-parameterized model?

Answer

A

1) Allowing the sum of the treatments be equal to zero (zero sum)
2) Allowing one of the treatments to be zero (dropping it from the model matrix, X, called a baseline constraint)

Question 38

Q

Descibe the purpose of multiple comparisons test and describe the two methods associated with it

Answer

A

After a global F-test of the treatments and rejecting the null hypothesis, the multiple comparisons test identifies which pairs of treatments are statistically significant.

1) Bonferroni Method- alpha/2 level for the t test of N-k degrees of freedom is divided by k’ = kchoose2
2) Tukey Method- measurement of statistical difference between treatments if t critical value exceeds the upper alpha quantile of the studentized range distribution with k and N-k degrees of freedom

Question 39

Q

Process to find B for linear and quadratic effects

Question 40

Q

P₁(x) and P₂(x) and equation for finding y with orthogonal polynomials

Question 41

Q

Explanation of one-way random effects model

Answer

A

Basically a one-way fixed effects model but now the operators considered come from a pool (population) of operators. This gives rise to the variance components, variance between the operators and within the operators

Question 42

Q

Do you do multiple comparison tests with a random effects model? Why or why not?

Answer

A

No because of the two error terms. You should find the expected value of these two terms

Question 43

Q

What is the variance of the average in a random effects model?

Answer

A

MSTr/(nk)

Question 44

Q

Before using hypothesis testing and confidence intervals, what model assumptions must be made?

Answer

A

1) Have all important effects been captured?
2) Are the errors independent and normally distributed?
3) Do the errors have constant variance?

Question 45

Q

What are the three major properties of residuals? And what is the variance of any residual?

Answer

A

E(r)=0

r and yhat are independent

r ~ Multi Norm (0, sig^2( I - H ) )

Var(r) = sig^2 (1-h_ii)

Question 46

Q

What four residual plots will help show model assumptions?

Answer

A

Plot r_i vs yhat_i

Plot r_i vs x_i

Plot r_i vs time sequence, i

Plot r_i vs replicates grouped by treatment

Question 47

Q

What should you do if there is a large number of replicates per treatment? What is helpful about this method?

Answer

A

Use a Box-whisker plot. It enables the location, dispersion, skewness, and extreme values of the replicated observations to be displayed in a single plot

Question 48

Q

IQR, IQR whiskers, implications for outliers and skewness

Answer

A

IQR = Q₃ - Q₁

Whiskers= [Q₁ - 1.5*IQR, Q₃ + 1.5*IQR]

Anything outside the whisker bounds is considered an outlier. If Q₁ and Q₂ are not symmetric about the median then this implies skewness.

Question 49

Q

Explain the purpose and process of the normal probability plot

Answer

A

Purpose: to test if the residuals follow a normal distribution

Process: Obtain ordered residuals which each have probability p_i = (i - .5)/N. Then plot p_i vs r_(i) which should be relatively S shaped if the residuals are somewhat normally distributed. However, typically there is a transformation of these probabilities that makes the desired shape to be a straight line (think qq-plot).

Question 50

Q

Name some experiments with more than one factor

Answer

A

Paired comparison design, randomized block design, two-way and multi-way layout, latin and graeco latin square design, balanced and incomplete block design (BIBD), split-plot design, ANCOVA

Question 51

Q

Definitions of paired comparison design and unpaired design

Answer

A

Paired comparison design: can be looked at as a RBD with block size 2. Considers two homogenous units and within each block two treatments are randomly assigned.

Unpaired design: The treatment size is still two, but now the units are not homogenous and therefore the experiment will have more degrees of freedom. Because it acounts for between sample variance, this design has lower power than the paired comparison design

Question 52

Q

t values for paired design

Question 53

Q

t value for unpaired design (two-sampled t-test)

Question 54

Q

Define (complete) randomized block design

Answer

A

k treatments are randomly assigned to each block (of k units) with b blocks and bk=N total sample size. For effective design, the units within each block should be more homogenous than units between blocks.

Question 55

Q

Model for randomized block design (mixed effects models)

Question 56

Q

F and t stats for RBD

Question 57

Q

ANOVA for RBD

Question 58

Q

Explain a two-way layout experiment

Answer

A

It involves two treatment factors with fixed levels. There is an interest in assessing the interaction effect between the two treatments

Question 59

Q

Show the model and estimation for the two- way layout

Question 60

Q

F test for two-way layout with sum of squares formulations

Question 61

Q

ANOVA for two way layout

Question 62

Q

Describe Multi-way layout designs

Answer

A

Like the two-way layout but expanded to 2 or more factors (treatments) and 2 or more treatments levels for each factor

Question 63

Q

Using zero-sum constraints for the three-way layout, show the predicted multi-way layout model and the corresponding formulations

Question 64

Q

ANOVA for three way layout

Answer 49

A

Each of the k Latin letters (ie treatments) appears once in each row and once in each column (these are the two blocking factors of the experiment)

Answer 50

A

Basically a super position of two Latin square designs. Useful for studying four factors (3 blocking 1 treatment or 2 blocking 2 treatment)

Answer 51

A

The number of treatments, t, is greater than the block size, k. Also this is balanced because each pair of treatments (or trio, quadruplet, etc) appears the same number of times (denoted by lambda)

Answer 52

A

bk=rt

r(k-1)=l(t-1)

Answer 53

A

A split plot should be used for situations where certain factors are hard to change. These hard to change factors would be considered whole plot factors and within each whole plot factor level would have subplot factors. Advantages include cost/time effectiveness. Disadvantages include loss in precision in the whole plot treatment comparison

Answer 54

A

ANCOVA should be used when auxillary covariates are available. In an experiment, it may be impractical to create blocks (think continuous variables) so ANCOVA can be used if correlation between covariate and treatment is high. Essentially we know the covariate term is important, but it is an uncontrollable source of error. In application can be viewed as a fusion of one-way treatment comparisons and simple linear regression. Advantages include reducing bias and improving sesitivity/reducing error from originial models. Disadvantages include fitting two models where covariate term was accounted for (this might not always be the case).

Answer 55

A

Used to see the inclusion or absense of levels for 2-level factors and their collective effect on a response. Would be used for exploratory analysis where linear trends are expected. Advantages include reporducibility and wider inductive basis because of symmetry of experiments. 2-level full factorial experiments are great for preliminary studies and are cost effective. They highlight interactions as well as isolotory effects. Disadvantages include when there are multiple factors (10 factors means 2^10 - 1 runs) and also the inability to observe polynomial terms because of only two levels.

Answer 56

A

Balance- each factor level appears in the same number of runs

Orthogonality- all paired level combinations for factors appear the same number of times

Replication- identical treatments applied to similar experimental units

Answer 57

A

The diffence in average value for all observations between the maximum range levels of a factor

Answer 58

A

The change in average response, when changing the level of one factor, depends on the level setting of another factor. There are synergistic and antagonistic interactions.

Answer 59

A

M E(BIA+) = z(B + IA+) - z(B -IA+).

Answer 60

A

(i) It gives the most parsimonious model, that is, with the fewest terms,
particularly the omission of higher-order terms like cubic effects and
interactions.
(Ii) There are no unusual patterns in the residual plots.
(iii) The transformation has good interpretability.

Answer 61

A

1) Effect Hierarchy principle: lower order effects are more likely to be important than higherorder effects and effects of the same order are equally likely to be important
2) Effect Sparsity principle; the number of relatively important effects in a factorial experiment is small
3) Effect heredity principle: in order for an interaction to be significant, at least one of its parent factors should be significant

Answer 62

A

1) Order factorial effect estimates
2) Plot ordered factorial effect estimates against corresponding inverse normal coordinates for (i-.5)/N for i=1,…,N
3) Under Ho al factorial effects=0 so normal plot should be a straight line
4) Any point which falls off the line is considered significant

Answer 63

A

Used because the log transformation transforms multiplicative relationships to additive ones, making them easier to model statistically. Its also easy to transform the sample variance back to its orignial value by exponentiating it.

Answer 64

A

For the 2^q blocks the block size should divide into the run size of the experiment

Usually one of the higher ordered factorial effects needs to represent the assignment of blocks because of the effect hierarchy principle

The block effect estimate will be the main effect of the blocks

For more blocks create more blocking equations

One major assumption is that the block-by-treatment interactions are negligible. The assumption generally states that the mean response when considering a certain treatment do not depend on the block. Without this, factorial effects would not be estimable by blocking relations.

Answer 65

A

Confounding: Setting up a relation which connects one design factor with another (in our case a block, eg: B=123 is a confounding relation). Literally means “confused”.

Abberation: For any blocking scheme b, let g_i(b) be the number of i-factor interactions tahat are confounded with block effects. Let r be the smallest i for any 2 blocking schemes such that g_r(b₁) does not equal g_r(b₂). Then if g_r(b₁)<g>r(b₂) then blocking scheme 1 has less aberration than b₂</g>

Estimability: Estimability of order e is determined by finding the lowest order of interactions confounded by block effects, named e+1. Therefore estimability of order e ensures that all factorial effects of order e are estimable in the blocking scheme.

The best blocking schemes are ones that ensure estimability of order 1 and minimum abberation among all blocking schemes.

Answer 66

A

2 level fractional factorial designs are a subset of full factorial designs. They have less run size and must use aliasing equations to account for loss in balance/orthogonality achieved by the full factorial designs. We write this as 2^k-p where k represents the number of factors and p represents the fraction of reduced runs.

Advantages include efficiency both in cost and time. Like a full factorial design it is reproducible and uses symmetry as the basis of its design.

Disadvantages include complexity of aliasing and scheme selection and the full space of the experiment is not explored.

Answer 67

A

Aliasing relation: Describes whatever factor combination is being confounded. Denoted, for example, I=ABC=BCD. There are 2^k-p-1 aliasing relations as well as degrees of freedom.

Word: Any confounded factor combination

Resolution: The smallest word in the defining contrast subgroup. It is desireable to have maximum resolution for fractional factorial designs.

There are 2^k-p runs in a 2^k-p experiment.

Answer 68

A

Clear: A factorial effect is clear if none of its aliases are main effects/ interactions

Strongly clear: a factorial effect is strongly clear if none of its aliases are main effects, 2 way, or 3 way interactions

1) In any Res IV design, all main effects are clear
2) In any Res V design, all main effects are strongly clear and 2 factor interactions are clear
3) Among Res IV designs, those with largest number of clear 2-factor interactions are best

Answer 69

A

Aliased ambiguities occur when factorial effects are significant but they cannot be distinguished from the experimental data because they are confounded with one another.

Plans include: Using domain knowledge to see some effects are not actually likely to be significant, use hierarchy principle to assume away higher order effects, to explore follow up experimentation using fold-over techniques and optimal design criterions.

Answer 70

A

The fold-over technique flips over the design matrix and finds the new aliasing relations (this doubles run size). A new factor represents the two halves of the combined designs (+,-). Use the augmented design matrix to dealias the effects believed to be important. Then analyze this design. This method is effective for analyzing all the main effects or one main effect and all its interactions for a resolution III design from the original experiment. There are problems since this is sort of a limited scope of dealiasing and also the number of runs must be doubled. There are more effecient ways to accomplish this.

Answer 71

A

An optimal design apporoach is a technique for follow up experiments to dealias ambiguities for the best model identified using a particular optimal design criterion. The model in use for optimal design should contain

1) All effects and their aliases judged significant a priori
2) A block variable that accounts for differences in average value of the response over different time periods from the original experiment and the follow up experiment
3) An intercept

D-optimal criterion: max_d |X_d‘X_d| where d=1,….,2*2^p where p is the number of regressors in the regression equation

D_s-optimal criterion: max_d|X₂‘X₂-X₂X₁(X₁‘X₁)^-1X₁‘X₂|

Can think of these in terms of regression. |X’X| is proportional to the reciprocal of the volume of the confidence ellipsoid for the estimated coefficients so that maximizing d is proportional to minimizing the volume of this confidence ellipsoid (ie more precise estimation).

Answer 72

A

Minimum aberration criterion supplemented by the number of clear effects

Answer 73

A

Larger the better problems:

1) Find factor settings that maximize E(y)
2) Find other factor settings that minimize Var(y)

Smaller the better problems:

1) Find factor settings that minimize E(y)
2) Find other factor settings that minimize Var(y)

Answer 74

A

1) Factors may effect the response in a non-monotone fashion. More levels allow the curvature effect to be understood.
2) If a qualitative factor has multiple levels that need to be understood (eg three separate settings on a machine)
3) If there is an initial setting in an optimization problem, then it would make sense to study the space around that setting. Therefore multiple levels would be needed.