Stats Flashcards
z-test
variance is known
(y-mu)/(sigma/sqrt(n))
(y1-y2)/sqrt(sigma2/n1+sigma2/n2)
t-test
variance is not known
y1-y2)/(s/sqrt(n)
CLT
zn = (x-nmu)/(nsigma2)
95 percentile
y +/- z(SE Mean)
SE Mean = s/sqrt(n)
ANOVA
SS(treat), df = a-1, MS, F
SS(E), df = N-a, MS, F
SS(T), df = N-1, MS, F
a in ANOVA
of treatments
n in ANOVA
of blocks
i in ANOVA
treatment
j in ANOVA
block
residual
yij - average(yi)
3 model adequacy checking graphs
(1) normal prob plot
(2) predicted values plot
(3) time series plot
normal prob plot
catches outliers, need to transform
x = residual
y = normal % probability
predicted values plot
tests homogeneity; control by control, randomize, transform
x = predicted yi
y = residual
time series plot
tests independence
x = run order time
y = response
tests for equality of variance
(1) bartletts
(2) modified levines
Box Cox
selects transform
Contrasts
(1) orthogonal
(2) scheffe - don’t need to specify in advance
Comparing Means
(1) Fischer LSD - does not use overall error rate
(2) Tukey’s test - uses overall error rate
(3) Dunnett’s test - when you have a control
Determining sample size
(1) operating characteristics of curves
(2) specifying std dev
Random Effects Model
Randomly selects levels
Random Control Block Design
- blocks represent a restriction on randomization
- control of nuisance
SS(treat)
(1/b) sum(yi2 - y2/N)
SS(block)
(1/a) sum(yj - y2/N)
SS(E)
SS(T) - other SS’s
SS(T)
sum(yij2 - y2/N)
df for RCBD
SS(Treat) = a - 1 SS(blocks) = b-1 SS(E) = (a-1)(b-1) SS(T) = N-1
Latin Square
- blocking in 2 directions
- 2 restrictions on randomization
- disadvantage - small DF, control by replicating operators
Latin Square setup
SS(Treatments), df = p-1 SS(Rows), df = p-1 SS(columns), df = p-1 SS(E), df = (p-1)(p-2) SS(T), df = p2-1
Crossover
- eliminate issue of time
- may still have residual effect (mixing of results)
Graeco Latin Square
blocks in 3 directions
Main effect
sum(A+)/2 - sum(A-)/2
Interaction
diff(A’s at B+)/2 - diff(A’s at B-)/2
SS(A)
1/bn(sum(yi2 - y2/abn)
SS(int)
1/n(sum(yij2 - y2/abn) - SS(A) - SS(B)
df for factorial design
A = a-1 B = b-1 error = ab(n - 1) T = abn - 1
SS(blocks)
1/(ab) sum(yk2 - y2/abn)
SS(A) for factorial
[a + ab - b - (1)]^2/4n
n is number of replicates
4 represents 2^2, would be 8 for 2^3
SS(T) df for factorial
4n - 1
Main effect for factorial
A = 1/2n [a + ab - b - (1)]
2 represents 2^2, would be 4 for 2^3
Coefficient for regression
SS/2
R^2
ss(model)/SS(Total)
Orthogonality
(1) = number of + and -
(2) sum of elements in column = 0
(3) I * col -> unchanged
(4) products of any 2 columns yields a column already on table
VIF
1/(1-R^2)
Types of error
- standard error (for regression coefficient)
- pure (from replication)
- lack of fit (from pooling)
- residual (PE + LOF)
Dispersion effect
look at ranges
Half normal
plot of coefficients
Defining relation
I = …
Design generator
A = BC (aliasing)
Resolution
Shortest word in a defining relation
Family
I = +/- ABC
Confirmation Experiment
Set factors at levels and compare -> regression model
Choosing a design
highest resolution
Number of treatment combinations
2^(5-2) = 8
Folding
change signs for all factors, odd become negative
Combined defining relation
multiply - words, copy + words
Aliases
1/2([i] + [i]’)
Plackett Burman
different class of III design
- needs to be a multiple of 4
- non-regular
- non-geometric
- not flexible - cannot be represented by cubes
Super saturated
P-B and sort on last row, delete all - or +
- k>N-1
k
number of factors
Treatment design
- know how design is confounded
- prevent nuisance variables
- signal what we know and don’t know
Experimental design
- Randomize to prevent bias
- Figure out execution
Estimate correct alias
- prior knowledge of system
- interaction plot
- p-values for each individually
- run other half
Empirical vs Mechanistic
derived vs. theoretical law
Regression
no statement of effect, not causal
Missing data point
Slightly different regression
Standard dev versus Confidence Interval
Variability in raw data versus variability in means
Prediction interval
CI around confirmation run
Lack of fit
how well points fit regression
2 error terms for regression
pure, lack of fit
Response Surface Methodology
sequential process, method/path of steepest ascent
Procedure for method of steepest ascent/descent
(1) 1st order model
(2) check error, interactions, quadratic effects (curvature)
(3) Ax1 = 1; x2 = something
(4) x = something
(5) test with new factor levels and keep stepping
(6) perform new factorial with region of exploration centered around optimal points
Why use center point?
- help check if don’t want to replicate
- check for curvature
- add df for error
Central composite design
- n(f) factorial runs, n(c) centerpoint runs, 2k axial
Sequential central composite design
(1) 1st order -> lack of fit
(2) introduce axial points to allow quadratic terms
Rotatable CCD
- indicates good model
- similar variances for points of interest when rotated
Box-Behnkin
- one factor is always at the center
- all points equidistant from center point, leads to = var
- spherical, no points at vertices
If you need a “-“ value for time
- don’t collect, missing value
- change other factor -> shift design
- constrained region - D-optimal
- inscribed CCD (inside of box)
- face-centered->replace corner with face points
Evolutionary operation
- constant monitoring and improving
- slight changes
- more data to find smaller differences\
- longer period of time, lurking
Mixture
- factor levels not independent
- lattice simplex
- centroid simplex
Lattice Simplex
{p, m}
p = components of mixture (sugar, cream)
m = all positive combinations of mixture (sugar = 0, 1/3, 2/3, 1)
p = 3 means 2D, m = 2 means 3 points on edge
Centroid simplex
2^p - 1 runs
Lattice vs. centroid
lattice is more flexible than centroid
Axial blends
axial points in the interior
Model Adequacy
checked 2nd time around