WEEK 7 Flashcards
Write a MLR model when there are continuous and categorical predictors
Yij = u + B1X1ij + B2 X2ij + B3X3ij + eij
What do the following mean in a MLR with a continuous and categorical predictor mean?
j - i - u
j = single observation from a single object
i = level of the categorical predictor
u = mean of all observations across all levels of all factors
Name all of the following terms
yij = u + B1REDKANGAROOij + B2GREYKANGAROOij
yij = response value at a given i level and j observation
u = mean of all observations across all levels
B1REDKANGAROOij = difference between the mean of
REDKANGAROOS and u at a given i level and
j observation
B2GREYYKANGAROOij = difference between the mean of
REDKANGAROOS and u at a given i level and
j observation
How do the coefficients differ from a SLR or MLR
The coefficients quantify the “unit change” change in the response variable with reference to u (mean of all obs)
When would you use an ANOVA test instead of a t-test
When we want to check for differences in means amongst groups, when there are two or more groups
Example:
Null = Ured = Ugrey = Uquokka
Alternative = At least one of the groups differ from the others
In ANOVA, what do the variables n and k stand for? And what are they used for
n = total observations (true replicates)
k = number of levels within the predictor variable (number of groups or factors)
Used for calculating the DFres = n - k
What is MS in ANOVA and how do you calculate it
Average deviations of the data from the group means
ms = SSres / DFres
How do you calculate the F-value in a ANOVA table
F-value = Msgroups / MSres
OR
F-value = signal / noise
What are the differences between observational studies and experimental studies
Observational studies cannot be modified and drivers cannot be isolated from the effects of confounding variables
Experimental studies can potentially isolate drivers from the effects of confounding variables
Why do we replicate experiments?
Experiments with small sample sizes are vunerable to sampling effects of lurking variables, which are variables that have an effect on one or more predictors but have not been accounted for in the model as we may not even be aware that they are important drivers
Replications allows us to account for such lurking variables increasing the models robustness.
Consider the following experiment:
We are interested in the effects of temperature on plant growth and it has two levels.
The levels are:
Two temperate cabinets, one at each level of temperature
- Three plants within each temperature cabinet
How many INDEPENDENT replicates are there within each level of the predictor variable temperature?
ONE.
This is because there is only ONE independent application of each temperature treatment.
In another words, the independent replicates are the temperate cabinets themselves, not the individual plants within them. In this case, you have one independent replicate for each temperature level because each cabinet represents a single unit of observation that is independent of the other.
Consider the following experiment:
We are interested in the effects of temperature on plant growth and it has two levels.
The levels are:
Two temperate cabinets, one at each level of temperature
- Three plants within each temperature cabinet
What are the experimental and measurement units and why
Experimental units: These are the entities to which the treatments are applied Temperate cabinets (because they receive the temperature treatment).
Measurement units: These are the entities on which the response is measured. Individual plants (because their growth is being measured).
What are pseudoreplicates? Why is it a problem?
Measurements or observations that are treated as independent replicates when they are not, leading to incorrect conclusions because the statistical analysis assumes more independent data points than actually exist.
In another words it artificially increases n and the power of the model when it shoul not.
It creates the illusion that you have more independent data points than you really do.
It can make the results appear more significant than they actually are because the variability within the data is underestimated.
What is an effective way of dealing with lurking variables?
A rondomized design. By assigning random locations to objects a long a gradient for example, it helps the model account for lurking variables by ensuring they affect all replicates evenly and thus not induce a false result.
What happens when we reduce noise and therefore increase the value of our F-ratio
Increases the power of our test