Experimental Design Flashcards

1
Q

Questions to conduct analysis

A
  • How to arrange data? - Do we conduct one-way ANOVA? - Is the response continuous or discrete - What is the model assumption - What is the distribution for the error, still normal? - Consider blocking factors and interactions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define Factor

A

Variable whose influence upon a response variable is being studied in an experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Factor Level

A

Numerical values or settings for a factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Trial (or run)

A

application of a treatment to an experimental unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Treatment or level combination

A

set of values for all factors in a trial

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Experimental unit

A

Object to which a treatment is applied

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Randomization and reasons for doing this

A

using a chance mechanism to assign treatments to experimental units or run order

Reasons for randomization: Reduces the chance that an unaticipated variable effect will confuse the results of the experiment (protects against unknown variables). Most of these unanticipated effects manifest themselves over time. Also it reduces the biases that an experimenter may impose on a design. Lastly it ensures validity of the estimate of experimental error and provides a basis for inference in analyzing the experiments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

5 categories of Experimental design

A
  1. Treatment comparisons
  2. Variable screening
  3. Response Surface Modeling
  4. System optimization
  5. System robustness
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Treatment Comparisons

A

Purpose is to compare several treatments of a factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Variable Screening

A

Have a large number of factros, but only a few are important. Experiment should identify the important few.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Response Surface Exploration

A

After important factors have been identified, their impact on the system is explored; regression model building

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

System Optimization

A

Interested in determining the optimum conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

System Robustness

A

Wish to optimize a system and also reduce the impact of uncontrollable (noise) factors. (example: car running well on different road conditions and with different driving habits)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Systematic Approach to experimentation

A
  • State the objective of the study
  • Choose the response variable
  • Choose factors and levels
  • Choose experimental design (plan)
  • Perform the experiment
  • Analyze the data
  • Draw conclusions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Three fundamental principles of experimental design

A

Replication, Randomization, Local control of error (blocking and covariates)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Define replication, difference between it and repetition

A

Each treatment is applied to units that are representative of the population. This helps to reduce variance and increase power to detect significant differences.

Repetition would be the repetition of a measurement any number of times on one unit. Replication is replicating the measurement process with a new unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Define Randomization and list its advantages

A

Use of a chance mechanism to assign teratments to units or to run order.

It has the following advantages:

  • protects against latent variables or “lurking” variables
  • Reduces influence of subjective bias in treatment assignments
  • ensures validity of statistical inference
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define blocking, notes on effective blocking strategies

A

A block refers to a collection of homogeneous units (example: hours, batches, lots, etc) .

Effective blocking: larger between-block variations than within-block

“block what you can and randomize what you cannot”

Run and compare treatments within the same blocks. Use randomization within blocks to eliminate block-block variation and reduce variability of treatment effects on estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define learning effect

A

Advantage given to the unit/person in an experiment. Mitigated with balanced randomization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Define balanced randomization

A

To randomly choose or go through randomization so that equal numbers of treatments to units are sustained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Two things that scientific method require

A
  1. Data collection
  2. Data analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Three basic methods of collecting data and explanations

A
  1. Retrospective studies (historical data)- least expensive and quickest way, data is readily available, data mining, but less than optimal for research goals/ questionable reliability
  2. Observational studies- uses observational studies to monitor processes (beware the heisenberg uncertainty principle), employs simple random sampling(most common)/ stratified random sampling/ systematic sampling
  3. Designed experiments- intentionally disturb the process and observe the results, manipulate factors reach equilibrium and observe response
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Difference between experimental and observational unit

A

The experimental unit is the smallest unit to which we apply a treatment combination.

The observational unit is the unit upon which we make the measurement. (May or may not be the experimental unit)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Define experimental error and observational error

A

Experimental error measures the variability among the experimental units. May be thought of as background noise, represents variability from trying to repeat the application of the specific combination of the factor levels

Observational error measures the variability due to the observational units. Is part of the experimental errror but only a part. (Think of baking pies in two different ovens)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Basic idea of local control of error

A

Reduce the random error among the experimental units. Control or account for anything which might affect the response other than the factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

OLS Estimation for simple linear regression (the model, what to minimize, and the different values and their variances)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

R2 formulas

A

RegrSS/CTSS = 1 - (RSS/CTSS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Another way to express RegrSS

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is another way to express MSE?

A

RSS/ (n-p)

sig hat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

SE(B1 hat)

A

sqrt (MSE/ Sxx)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What does det[(x’x)-1] represent?

A

It is proportional to the reciprocal of the volume of the confidence ellipsoid for the estimated coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Principle of Parsimony

A

Occam’s razor: “entities should not be multiplied beyond necessity”. So choose fewer variables with sufficient explanatory power. This is a desirable modeling strategy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Explain a One-way layout design in words

A

A single-factor experiment with k levels (treatments)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Linear model for the one-way layout

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

ANOVA for one-way layout estimated model

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Describe over-paramterization

A

When there are k types of observations but regression parameters are greater than k. When fitting the model, (X’X)-1 will not exist because it is not full rank since X’X is singular. Constraints will be needed to make X’X a nonsingular matrix.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What are the two types of constraints for an over-parameterized model?

A

1) Allowing the sum of the treatments be equal to zero (zero sum)
2) Allowing one of the treatments to be zero (dropping it from the model matrix, X, called a baseline constraint)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Descibe the purpose of multiple comparisons test and describe the two methods associated with it

A

After a global F-test of the treatments and rejecting the null hypothesis, the multiple comparisons test identifies which pairs of treatments are statistically significant.

1) Bonferroni Method- alpha/2 level for the t test of N-k degrees of freedom is divided by k’ = kchoose2
2) Tukey Method- measurement of statistical difference between treatments if t critical value exceeds the upper alpha quantile of the studentized range distribution with k and N-k degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Process to find B for linear and quadratic effects

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

P1(x) and P2(x) and equation for finding y with orthogonal polynomials

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Explanation of one-way random effects model

A

Basically a one-way fixed effects model but now the operators considered come from a pool (population) of operators. This gives rise to the variance components, variance between the operators and within the operators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Do you do multiple comparison tests with a random effects model? Why or why not?

A

No because of the two error terms. You should find the expected value of these two terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is the variance of the average in a random effects model?

A

MSTr/(nk)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Before using hypothesis testing and confidence intervals, what model assumptions must be made?

A

1) Have all important effects been captured?
2) Are the errors independent and normally distributed?
3) Do the errors have constant variance?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What are the three major properties of residuals? And what is the variance of any residual?

A

E(r)=0

r and yhat are independent

r ~ Multi Norm (0, sig^2( I - H ) )

Var(r) = sig^2 (1-hii)

46
Q

What four residual plots will help show model assumptions?

A

Plot ri vs yhati

Plot ri vs xi

Plot ri vs time sequence, i

Plot ri vs replicates grouped by treatment

47
Q

What should you do if there is a large number of replicates per treatment? What is helpful about this method?

A

Use a Box-whisker plot. It enables the location, dispersion, skewness, and extreme values of the replicated observations to be displayed in a single plot

48
Q

IQR, IQR whiskers, implications for outliers and skewness

A

IQR = Q3 - Q1

Whiskers= [Q1 - 1.5*IQR, Q3 + 1.5*IQR]

Anything outside the whisker bounds is considered an outlier. If Q1 and Q2 are not symmetric about the median then this implies skewness.

49
Q

Explain the purpose and process of the normal probability plot

A

Purpose: to test if the residuals follow a normal distribution

Process: Obtain ordered residuals which each have probability pi = (i - .5)/N. Then plot pi vs r(i) which should be relatively S shaped if the residuals are somewhat normally distributed. However, typically there is a transformation of these probabilities that makes the desired shape to be a straight line (think qq-plot).

50
Q

Name some experiments with more than one factor

A

Paired comparison design, randomized block design, two-way and multi-way layout, latin and graeco latin square design, balanced and incomplete block design (BIBD), split-plot design, ANCOVA

51
Q

Definitions of paired comparison design and unpaired design

A

Paired comparison design: can be looked at as a RBD with block size 2. Considers two homogenous units and within each block two treatments are randomly assigned.

Unpaired design: The treatment size is still two, but now the units are not homogenous and therefore the experiment will have more degrees of freedom. Because it acounts for between sample variance, this design has lower power than the paired comparison design

52
Q

t values for paired design

A
53
Q

t value for unpaired design (two-sampled t-test)

A
54
Q

Define (complete) randomized block design

A

k treatments are randomly assigned to each block (of k units) with b blocks and bk=N total sample size. For effective design, the units within each block should be more homogenous than units between blocks.

55
Q

Model for randomized block design (mixed effects models)

A
56
Q

F and t stats for RBD

A
57
Q

ANOVA for RBD

A
58
Q

Explain a two-way layout experiment

A

It involves two treatment factors with fixed levels. There is an interest in assessing the interaction effect between the two treatments

59
Q

Show the model and estimation for the two- way layout

A
60
Q

F test for two-way layout with sum of squares formulations

A
61
Q

ANOVA for two way layout

A
62
Q

Describe Multi-way layout designs

A

Like the two-way layout but expanded to 2 or more factors (treatments) and 2 or more treatments levels for each factor

63
Q

Using zero-sum constraints for the three-way layout, show the predicted multi-way layout model and the corresponding formulations

A
64
Q

ANOVA for three way layout

A
65
Q

Explain a Latin square design

A

Each of the k Latin letters (ie treatments) appears once in each row and once in each column (these are the two blocking factors of the experiment)

66
Q

Show the estimated model and formulation for latin square design

A
67
Q

ANOVA for latin square design

A
68
Q

Explain a Graeco-Latin square design

A

Basically a super position of two Latin square designs. Useful for studying four factors (3 blocking 1 treatment or 2 blocking 2 treatment)

69
Q

Show model and ANOVA for Graeco latin square design

A
70
Q

Explain BIBD (Balanced incomplete block design)

A

The number of treatments, t, is greater than the block size, k. Also this is balanced because each pair of treatments (or trio, quadruplet, etc) appears the same number of times (denoted by lambda)

71
Q

What are the two basic relations for BIBD (involving b(number of blocks), k(block size), r(number of treatment replications), t(number of treatments), and lambda(l, the number of times pairs appear))

A

bk=rt

r(k-1)=l(t-1)

72
Q

Explain split plot design, when and why it should be used and potential advantages/disadvantages.

A

A split plot should be used for situations where certain factors are hard to change. These hard to change factors would be considered whole plot factors and within each whole plot factor level would have subplot factors. Advantages include cost/time effectiveness. Disadvantages include loss in precision in the whole plot treatment comparison

73
Q

Explain ANCOVA, when and why it should be used, and potential advantages/disadvantages

A

ANCOVA should be used when auxillary covariates are available. In an experiment, it may be impractical to create blocks (think continuous variables) so ANCOVA can be used if correlation between covariate and treatment is high. Essentially we know the covariate term is important, but it is an uncontrollable source of error. In application can be viewed as a fusion of one-way treatment comparisons and simple linear regression. Advantages include reducing bias and improving sesitivity/reducing error from originial models. Disadvantages include fitting two models where covariate term was accounted for (this might not always be the case).

74
Q

Explain a 2-level Full Factorial Design, when and why it might be used, and potential advantages/disadvantages

A

Used to see the inclusion or absense of levels for 2-level factors and their collective effect on a response. Would be used for exploratory analysis where linear trends are expected. Advantages include reporducibility and wider inductive basis because of symmetry of experiments. 2-level full factorial experiments are great for preliminary studies and are cost effective. They highlight interactions as well as isolotory effects. Disadvantages include when there are multiple factors (10 factors means 2^10 - 1 runs) and also the inability to observe polynomial terms because of only two levels.

75
Q

3 key properties of full factorial design

A

Balance- each factor level appears in the same number of runs

Orthogonality- all paired level combinations for factors appear the same number of times

Replication- identical treatments applied to similar experimental units

76
Q

Explain in words a main effect

A

The diffence in average value for all observations between the maximum range levels of a factor

77
Q

Explain in words an interaction

A

The change in average response, when changing the level of one factor, depends on the level setting of another factor. There are synergistic and antagonistic interactions.

78
Q

Equations for an interaction

A
79
Q

Conditional main effect equation

A

M E(BIA+) = z(B + IA+) - z(B -IA+).

80
Q

Equations for Box Cox Transformations

A
81
Q

Split plot SS

A
82
Q

Split plot ANOVA

A
83
Q

Split plot model and hypotheses

A
84
Q

Reasons for power transformation

A

(i) It gives the most parsimonious model, that is, with the fewest terms,
particularly the omission of higher-order terms like cubic effects and
interactions.
(Ii) There are no unusual patterns in the residual plots.
(iii) The transformation has good interpretability.

85
Q

Three fundamental principles for factorial effects

A

1) Effect Hierarchy principle: lower order effects are more likely to be important than higherorder effects and effects of the same order are equally likely to be important
2) Effect Sparsity principle; the number of relatively important effects in a factorial experiment is small
3) Effect heredity principle: in order for an interaction to be significant, at least one of its parent factors should be significant

86
Q

Steps to construct a normal plot for factorial effects

A

1) Order factorial effect estimates
2) Plot ordered factorial effect estimates against corresponding inverse normal coordinates for (i-.5)/N for i=1,…,N
3) Under Ho al factorial effects=0 so normal plot should be a straight line
4) Any point which falls off the line is considered significant

87
Q

Half normal plots and their advantages

A
88
Q

Lenth’s method (Individual Error rate version)

A
89
Q

Nominal the best

A
90
Q

Deriving the use of log sample variance for dispersion analysis and why this is used

A

Used because the log transformation transforms multiplicative relationships to additive ones, making them easier to model statistically. Its also easy to transform the sample variance back to its orignial value by exponentiating it.

91
Q

Describe procedures for blocking and optimal arrangement of 2k factorial designs for 2q blocks and any assumptions made.

A

For the 2q blocks the block size should divide into the run size of the experiment

Usually one of the higher ordered factorial effects needs to represent the assignment of blocks because of the effect hierarchy principle

The block effect estimate will be the main effect of the blocks

For more blocks create more blocking equations

One major assumption is that the block-by-treatment interactions are negligible. The assumption generally states that the mean response when considering a certain treatment do not depend on the block. Without this, factorial effects would not be estimable by blocking relations.

92
Q

What makes a good blocking scheme? Define terms confounding, abberation, and estimability.

A

Confounding: Setting up a relation which connects one design factor with another (in our case a block, eg: B=123 is a confounding relation). Literally means “confused”.

Abberation: For any blocking scheme b, let gi(b) be the number of i-factor interactions tahat are confounded with block effects. Let r be the smallest i for any 2 blocking schemes such that gr(b1) does not equal gr(b2). Then if gr(b1)<g>r(b2) then blocking scheme 1 has less aberration than b2</g>

Estimability: Estimability of order e is determined by finding the lowest order of interactions confounded by block effects, named e+1. Therefore estimability of order e ensures that all factorial effects of order e are estimable in the blocking scheme.

The best blocking schemes are ones that ensure estimability of order 1 and minimum abberation among all blocking schemes.

93
Q

Explain 2 level fractional factorial designs, when and why they should be used, and potential advantages/disadvantages

A

2 level fractional factorial designs are a subset of full factorial designs. They have less run size and must use aliasing equations to account for loss in balance/orthogonality achieved by the full factorial designs. We write this as 2k-p where k represents the number of factors and p represents the fraction of reduced runs.

Advantages include efficiency both in cost and time. Like a full factorial design it is reproducible and uses symmetry as the basis of its design.

Disadvantages include complexity of aliasing and scheme selection and the full space of the experiment is not explored.

94
Q

Within the context of 2 level fractional factorial designs, define these terms:

Aliasing relation, word, resolution

Also, how many df, aliasing relations, runs

A

Aliasing relation: Describes whatever factor combination is being confounded. Denoted, for example, I=ABC=BCD. There are 2k-p-1 aliasing relations as well as degrees of freedom.

Word: Any confounded factor combination

Resolution: The smallest word in the defining contrast subgroup. It is desireable to have maximum resolution for fractional factorial designs.

There are 2k-p runs in a 2k-p experiment.

95
Q

Rules for Resolution IV and V Designs for 2 level fractional factorial designs. Define clear and strongly clear.

A

Clear: A factorial effect is clear if none of its aliases are main effects/ interactions

Strongly clear: a factorial effect is strongly clear if none of its aliases are main effects, 2 way, or 3 way interactions

1) In any Res IV design, all main effects are clear
2) In any Res V design, all main effects are strongly clear and 2 factor interactions are clear
3) Among Res IV designs, those with largest number of clear 2-factor interactions are best

96
Q

Variance of a factorial effect

A
97
Q

Steps for analysis of fractional factorial designs using regression

A
98
Q

Describe the problem of aliased ambiguities and briefly state plans to resolve them

A

Aliased ambiguities occur when factorial effects are significant but they cannot be distinguished from the experimental data because they are confounded with one another.

Plans include: Using domain knowledge to see some effects are not actually likely to be significant, use hierarchy principle to assume away higher order effects, to explore follow up experimentation using fold-over techniques and optimal design criterions.

99
Q

What is the fold-over technique? Advantages/disadvantages?

A

The fold-over technique flips over the design matrix and finds the new aliasing relations (this doubles run size). A new factor represents the two halves of the combined designs (+,-). Use the augmented design matrix to dealias the effects believed to be important. Then analyze this design. This method is effective for analyzing all the main effects or one main effect and all its interactions for a resolution III design from the original experiment. There are problems since this is sort of a limited scope of dealiasing and also the number of runs must be doubled. There are more effecient ways to accomplish this.

100
Q

Describe an optimal design approach and two criteria for this approach

A

An optimal design apporoach is a technique for follow up experiments to dealias ambiguities for the best model identified using a particular optimal design criterion. The model in use for optimal design should contain

1) All effects and their aliases judged significant a priori
2) A block variable that accounts for differences in average value of the response over different time periods from the original experiment and the follow up experiment
3) An intercept

D-optimal criterion: maxd |Xd‘Xd| where d=1,….,2*2p where p is the number of regressors in the regression equation

Ds-optimal criterion: maxd|X2‘X2-X2X1(X1‘X1)-1X1‘X2|

Can think of these in terms of regression. |X’X| is proportional to the reciprocal of the volume of the confidence ellipsoid for the estimated coefficients so that maximizing d is proportional to minimizing the volume of this confidence ellipsoid (ie more precise estimation).

101
Q

What should be used to evaluate the effectiveness of a fractional factorial design?

A

Minimum aberration criterion supplemented by the number of clear effects

102
Q

Explain larger the better and smaller the better problems

A

Larger the better problems:

1) Find factor settings that maximize E(y)
2) Find other factor settings that minimize Var(y)

Smaller the better problems:

1) Find factor settings that minimize E(y)
2) Find other factor settings that minimize Var(y)

103
Q

What are some practical considerations that make it desirable to study factors
with more than two levels?

A

1) Factors may effect the response in a non-monotone fashion. More levels allow the curvature effect to be understood.
2) If a qualitative factor has multiple levels that need to be understood (eg three separate settings on a machine)
3) If there is an initial setting in an optimization problem, then it would make sense to study the space around that setting. Therefore multiple levels would be needed.

104
Q

Suppose we have a 3k full factorial experiment for factors A,B,C with three replicates. How many degrees of freedom are there for the terms?

A
105
Q

Explain the linear-quadratic system for 3 level fractional factorial design

A
106
Q

What does it mean for two factors to be partially aliased?

A

The pair of effects has an angle between 0 and 90 degrees.

107
Q

Explain an orthogonal array

A
108
Q

Why use an orthogonal array?

A

Orthogonal arrays have better run size economy (less runs) and flexibility of factor level combinations

109
Q

How to determine OA run size

A
110
Q

What is RSM?

A

Response surface methodology uses experimentation, modeling, data analysis, and optimization to understand the surface of the response

111
Q

What are the three types of points for CCD in RSM?

A

Central composite design: Corner points, axial points, center points

112
Q

Explain robust parameter design

A

Choose control fator settings to make response less sensitive (ie more robust) to noise variation, exploiting control-by-noise interactions