Exam 3 - December 9th Flashcards
What sort of test would you use on a data set with a quantitative response variable and no explanatory variable?
One-Sample T
What sort of tests could potentially be used on data sets with a quantitative response variable and one categorical explanatory variable?
Two-Sample T, Paired T, One-Way ANOVA
What sort of tests could potentially be used on data sets with a quantitative response variable variable, one categorical explanatory variable, and two groups?
Two-Sample T or Paired T
What sort of test would you use on a data set with a quantitative response variable, one categorical explanatory variable, and two independent groups?
Two-Sample T
What sort of test would you use on a data set with a quantitative response variable, one categorical explanatory variable, and two dependent groups?
Paired T
What sort of test would you use on a data set with a quantitative response variable, one categorical explanatory variable, and more than two groups?
One-Way ANOVA
What sort of test would you use on a data set with a quantitative response variable and two categorical explanatory variables?
Two-Way ANOVA
What sort of test would you use on a data set with a quantitative response variable and one quantitative explanatory variable?
Simple Linear Regression
What sort of test would you use on a data set with a categorical response variable and no explanatory variable?
One-Sample Proportion
What sort of test would you use on a data set with a categorical response variable and one explanatory variable?
Two-Sample Proportion
What is the factor effects model for the two-way ANOVA?
Yijk = μ + 𝞪I + 𝜷J + (𝞪𝜷)IJ + εijk
What does the factor effects model for the two-way ANOVA mean?
Each observation is equal to the overall mean, plus the effects of Factor A, Factor B, and the interaction between them, plus some error
What are the constraints for the Two-Way ANOVA?
∑ 𝞪I = ∑ 𝜷J = ∑ 𝞪𝜷IJ = 0
Zero-Sum Constraint
How can you check the constraint for the Two-Way ANOVA?
For Main Effects: The sum the means for each level of factor A (or B) minus the overall mean is zero
- if a and b = 2, half of the individuals are at each level (A1, A2, B1, and B2)
For Interaction: The sum of the means at each treatment combination, minus the means for factor level A, minus the means for factor level B, plus the overall mean is zero
- if a and b = 2, one-quarter of the individuals are in each treatment combination
What are the assumptions for the Two-Way ANOVA?
Optional → Equal Sample Sizes or Balanced Design, same number of individuals at each treatment combination
εijk (IID) ~ N(0, σ^2) → Errors are normal, independent, and have constant variance
What is an interaction plot for a Two-Way ANOVA?
Plot means for each treatment combination against
levels of a factor, with different lines for each factor
Parallel lines indicate no interaction, crossing lines indicate a possible interaction
Can be antagonistic or reinforcing/synergistic
If there is a significant interaction for a Two-Way ANOVA, how does that change the interpretation of the main effects?
You cannot interpret the main effects separately, and must instead conduct a Tukey Pairwise Comparison to test for Factor A effects for each level of Factor B and vice versa
What is the Two-Way ANOVA hypothesis for the main effects?
Main Effect A → H0: 𝞪I = 0 for all I versus HA: Not all 𝞪I equal zero
Main Effect B → H0: 𝜷J = 0 for all J versus HA: Not all 𝜷J equal zero
What is the Two-Way ANOVA hypothesis for interaction?
Interaction → H0: (𝞪𝜷)IJ = 0 for all I, J versus HA: Not all (𝞪𝜷)IJ equal zero
What is the next step for the Two-Way ANOVA if the interaction is found to be significant?
Both factors are important and main effects need to be analyzed using pairwise comparisons (instead of F tests)
What is the next step for the Two-Way ANOVA if the interaction is not significant, but a two level main effect is?
State level with higher mean
What is the next step for the Two-Way ANOVA if the interaction is not significant, but a multilevel main effect is?
Use Tukey Pairwise Comparisons to test for levels that are significantly different, state higher mean
How can you check the assumptions for a Two-Way ANOVA?
Normal Probability Plot of Residuals
Residuals versus Fitted/Factor Levels
Residuals versus Order/Time
Define sample size, effect size, significance level, and power
Sample size (n) - number of subjects in the study Effect size (△/σ) - effect relative to noise Significance level (𝞪) - probability of false positive Power (1-𝜷) - probability of true positive
Using a power-curve for a One-Way ANOVA, how can you determine the sample sizes for a Two-Way ANOVA?
Sample size of graph is per group (particular treatment combination), choose alpha and power and use σ = √MSE from a previous study
What is the model for the additive Two-Way ANOVA?
Yijk= μ + 𝞪I + 𝜷J + + εijk
What are the constraints for the additive Two-Way ANOVA?
Used when there is only one replicate per treatment combination
Zero-Sum Constraint
What are the assumptions for the additive Two-Way ANOVA? How can you check the ones that are different from a regular Two-Way ANOVA?
εijk (IID) ~ N(0, σ^2) → Errors are normal, independent, and have constant variance
Interaction terms are assumed to be zero (Interaction Plot)
Why does the additional assumption for additive Two-Way ANOVAs exist?
With only one replicate per treatment, there are not enough degrees of freedom for both interaction and error
What is a Randomized Complete Block Design? (RCBD)
A specialized form of the additive Two-Way ANOVA used when experimental units are non-homogeneous
Blocks and treatments are assumed not to interact
What is a “block” in a Randomized Complete Block Design?
One block is a complete replication of the set of treatments
In agricultural studies, a block is one field and different sections of the field have different treatments
What is the model for a Randomized Complete Block Design (RCBD)?
Yijk = μ + 🇹I + 𝜷J + + εijk
How is a multifactor ANOVA different from a Two-Way ANOVA in terms of analysis?
If the three-way interaction or multiple two-way interactions are significant, analyze the three factors jointly (in terms of treatment ABC)
If only one two-way interaction is significant, analyze that interaction and the remaining main effect
If no interactions are significant, analyze main effects separately
What is the model for a simple linear regression?
YI = 𝜷0 + 𝜷1*XI + εI
What are the assumptions for a simple linear regression?
εijk (IID) ~ N(0, σ^2) → Errors are normal, independent, and have constant variance
There is a linear relationship
How do you check the general assumptions for a simple linear regression?
Check with residuals
Sequence Plot
Normal Probability Plot or Histogram
Residuals vs X or Y
How do you check the main assumption of a simple linear regression?
Scatterplot of values
explanatory variable on the x-axis and the response on the y-axis
Residuals vs X or Y
What are the “fitted values” and “residuals” for a simple linear regression?
residuals are the distances between observed and predicted values
Predicted/Fitted Values → Points on the regression line above or below observed
What is the meaning of the “scope” of a simple linear regression?
Scope - Range of data points
Predictions within range are valid, termed interpolation
Predictions outside of range are termed extrapolation
What is the significance of the correlation coefficient r for the simple linear regression?
Describes the strength and direction of the linear relationship between the variables
greater than .8 is strong, less than .5 is weak
What is the significance of the coefficient of determination R^2 of a simple linear regression?
R2 is the coefficient of determination or the proportion of Y’s variance that is explained by the regression of Y on X
Describe “least squares estimation” in terms of the simple linear regression
Slope and intercept are chosen to minimize the sum of the squared residuals
What is the equation for a fitted regression line of a simple linear regression?
Fitted Line is YI-hat = 𝜷0-hat + 𝜷1-hat*XI
Generally, what can a simple linear regression say about the variables that produce its data set?
determines the presence of a linear correlation, Causation can only be identified by randomization trials
What data transformations (2) could be used to fix the broken assumption of a linear relationship in a simple linear regression?
Y = 𝜷0 + 𝜷1X + 𝜷2X2 for a Quadratic Relationship
Transformation of Explanatory Variable X (LnX, √X, X#, 1/X#)
What data transformations (2) could be used to fix the broken assumption of constant variance in a simple linear regression?
Transformation of Response Variable Y (Box-Cox, 1=no transformation, 0=Ln)
Weighted Least-Squares
How do you analyze Box-Cox output?
number is data point^#, 1 is no transformation and 0 is natural log
What data transformations (2) could be used to fix the broken assumption of normal errors in a simple linear regression?
Transformation of Y
Different Response Types (Binomial - Categorical, Poisson - Count)
What data transformation could be used to fix the broken assumption of a lack of major outliers in a simple linear regression?
Robust Regression Analysis can reduce the impact of outliers
What are the hypothesis tests and test statistic for a significant linear relationship for a simple linear regression?
H0: 𝜷1 = 0 versus HA: 𝜷1 ≠ 0
T-Statistic: 𝜷1 / SE(𝜷1)
Follows a t-distribution with n-2 degrees of freedom
What is the equivalence of the F and t tests for a simple linear regression?
The square of the t-statistic (𝜷1 / SE(𝜷1)) is exactly the same as the f-statistic (MS regression / MS error) for a simple linear regression when the numerator degrees of freedom for the F-test is 1
What is the difference between a confidence interval for mean response and a prediction interval for a simple linear regression?
Prediction Intervals will always be wider than Confidence Intervals of mean response because they take into account error and deviation from the mean
When is it appropriate to conduct tests and confidence intervals for the intercept of a simple linear regression?
When the intercept (X=0) is practically significant and within the scope of the data
When would you conduct a one-sample proportion?
Categorical Response (Proportion), No Explanatory Variable
Estimate true proportion/frequency of one of two possible categorical responses and compare to proposed proportion/frequency
What is the hypothesis test for a one-sample proportion?
H0: p = p0 or HA: p (>,
How do you go from the test statistic for a one-sample proportion to the p-value using the standard normal table?
Find number corresponding to z score (narrow by nearest .1 on left column and narrow by nearest .01 by top row)
Area/Proportion below Z = table value
Area/Proportion above Z = 1 - table value
When would you conduct a two-sample proportion?
Categorical Response (Proportion), One Explanatory Variable
Compare true proportion/frequency of one of two possible categorical responses between two groups
When would you conduct a Chi-Square test for Association?
checking for an an association between one categorical variable and another using a frequency table of the number of observations in each combination of categories
When would you conduct a Chi-Square test for Goodness of Fit?
checking for how well experimental data fit a certain theoretical distribution
What is the difference between a Chi-Square test for Association and Goodness of Fit?
The expected counts come from a theoretical distribution, not from a lack of association