R-Code for Exam, Modules 9-13 Flashcards
query_X = DATASET_NAME$CAT_VARIABLE == “X”
index_X = which(query_X)
quantities_X = DATASET_NAME$QUANT_VARIABLE[index_X]
m_quantities_X = mean(quantities_X)
sum_squares_X = sum((quantities_X - m_quantities_X)^2)
we have a data set with three categorical variables – X, Y and Z – and we want to find the sum of squares associated with the quantities for variable X
query_Y = DATASET_NAME$CAT_VARIABLE == “Y”
index_Y = which(query_Y)
quantities_Y = DATASET_NAME$QUANT_VARIABLE[index_Y]
m_quantities_Y = mean(quantities_Y)
sum_squares_Y = sum((quantities_Y - m_quantities_Y)^2)
query_Z = DATASET_NAME$CAT_VARIABLE == “Z”
we have a data set with three categorical variables – X, Y and Z – and we want to find the sum of squares associated with the quantities for variable Y
query_Z = DATASET_NAME$CAT_VARIABLE == “Z”
index_Z = which(query_Z)
quantities_Z = DATASET_NAME$QUANT_VARIABLE[index_Z]
m_quantities_Z = mean(quantities_Z)
sum_squares_Z = sum((quantities_Z - m_quantities_Z)^2)
we have a data set with three categorical variables – X, Y and Z – and we want to find the sum of squares associated with the quantities for variable Z
within_group_ss = sum(sum_squares_X + sum_squares_Y + sum_squares_Z)
calculates the within group sum of squares (“within_group_ss”) for three categorical variables (X, Y and Z), whose individual sum of squares has already been calculated
grand_mean = mean(DATASET_NAME$QUANT_VARIABLE)
calculates the grand mean of the quantitative variable of the dataset (saved as “grand_mean”), which represents the mean value across all the members of the categorical variable (X, Y, Z, etc.)
among_group_ss =
(# REPS_X)(m_quantities_X - grand_mean)^2 +
(# REPS_Y)(m_quantities_Y - grand_mean)^2 +
(# REPS_Z)*(m_quantities_Z - grand_mean)^2
code to find the among group sum of squares for three categorical variables (X, Y and Z), given the number of replications they have in the dataset, the grand mean among the three variables, and the mean quantities of each
total_ss = within_group_ss + among_group_ss
the total sum of squares equals the sum of the within group and the among group sum of squares
table_ss =
as.table(rbind(c(“total_ss”,
“within_group_ss”,
“among_group_ss”),
c(total_ss,
within_group_ss,
among_group_ss)))
code which establishes a table to compare the within, among, and total sum of squares for a set of data
anova_model =
aov(QUANTITATIVE~CATEGORICAL,
data = DATASET_NAME)
code to establish an ANOVA model for the data (“anova_model”) which relates a quantitative and categorical variable (“QUANTITATIVE”; “CATEGORICAL”) to a dataset (“DATASET_NAME”)
plot(factored_variable, DATASET_NAME$QUANT_VARIABLE,
ylim = c(#, #), ylab = “BLAH BLAH”,
xlab = “BLAH BLAH”,
col = “red”/”blue”/”green” [etc.])
creates a box-plot of a categorical variable which has been coded as a factor variable, which allows one to see if the categories differ significantly in their expression from one another
anova_model_1 = aov(quantitative_variable ~
factored_variable,
data = DATASET_NAME)
first method to establish an ANOVA model of a dataset, assuming we have already created the object “factored_variable” which relates to our variables “X”, “Y” and “Z”
anova_model_2 = aov(quantitative_variable ~ factor(DATASET_NAME$CAT_VARIABLE, c(“X”, “Y”, “Z”),
data = DATASET_NAME)
second method to establish an ANOVA model of a dataset, assuming we have not yet created the object “factored_variable” for the variables “X”, “Y” and “Z”
residuals = quantitative_variable – predict.lm(anova_model)
plot(predict.lm(anova_model), residuals, cex.lab = 1.15,
ylim = c(LOW #, HIGH #),
xlim = c(LOW #, HIGH #),
ylab = “BLAH BLAH”,
xlab = “YADA YADA”)
abline(a = 0, b = 0, lwd = 2/3, lty = “dashed”, col = “red”)
code we could establish to plot the residuals versus predicted values for the data set to check for the homogeneity of variance
stdRes = rstandard(anova_model)
calculates the standard residuals for the ANOVA model (“anova_model”) and saves them as the object “stdRes”
qqnorm(stdRes,
ylab=”Standardized Residuals”,
xlab=”Theoretical Quantiles”)
qqline(stdRes, col=”red”, lwd=2)
establishes a normal Q-Q plot for the residuals, along with the linear line of best fit
TukeyHSD(anova_model, ordered = TRUE)
conducts a Tukey’s Honestly Significant Difference test on the ANOVA model in order to determine which groups in the ANOVA vary significantly from one another
plot(TukeyHSD(anova_model, ordered = TRUE))
establishes a plot of the Tukey’s test of the ANOVA model – pairs of variables which don’t overlap with the dashed horizontal line at x = 0 are significantly different, while those which lie on the dashed line are significantly different
quant_var = DATASET_NAME$QUANTITATIVE_VARIABLE
fact_var1 = factor(DATASET_NAME$CATEGORY_VARIABLE_1)
fact_var2 = factor(DATASET_NAME$CATEGORY_VARIABLE_2)
par(mfrow = c(1, 2))
plot(fact_var1, quant_var,
c(LOW #, HIGH #),
ylab = “BLAH BLAH”,
xlab = “YADA YADA”,
col = “red”)
plot(fact_var2, quant_var,
c(LOW #, HIGH #),
ylab = “BLAH BLAH”,
xlab = “YADA YADA”,
col = “blue”)
codes which if typed out would factor two categorical variables and establish an object for the quantitative variables, then would allow for two box-plots to be set up to compare the relationship between the first factored variable and the quantitative variable and the second factored variable and the quantitative variable