topic 4 Flashcards
What are the assumptions of ANOVA?
Normally distributed residuals; observations are independent both within and between samples; the data is continuous and approximately normally distributed; variance is homogenous within each group and sample.
What is a ‘Repeated measures ANOVA?
A Repeated measures ANOVA allows you to extend the paired-samples T-test approach to 3 or more comparisons. The test can analyse whether a dependent variable is changing over time in response to a factor e.g. annual temperatures.
Which ANOVA assumption is violated by a 2-way ANOVA?
The assumption of all samples within and between groups are independent; i.e. when measuring how gas levels change over time the observation at one site is directly related to the next observation after a given period of time.
How do you set up a 2-way ANOVA?
analyse> general linear model> repeated measures.
The within subject factor name needs to be defined by the repeated measure e.g. time; the number of levels then needs to be defined; the observations need to be dragged into the within-subject variables and the repeated measure needs to be dragged into the between-subject factors; save the standardised residuals.
What are the significant values in the 2-way ANOVA output?
Descriptive statistis show the mean and variation of each factor level whilst confirming the number of observations in each; Mauchly’s test for sphericity shows if variance is equal across all differences (0.05); the result determines which line should be considered in the next output box; if non-significant consider the first line, if significant consider the second.
When is ANCOVA applicable?
ANCOVA is applicable when another factor that cannot be controlled may be influencing the outcome of the experiment. e.g. sea water temperature.
What is linear regression?
Linear regression is used to determine the form and strength of a relationship between two variables.
The analysis is an implied cause and effect allowing you to describe the relationship between x and y.
What two things are required to be defined when calculating regression?
The distance from the mean value of y to the fitted line at each data point is calculated; these values are then squared and summed.
The distance from the fitted line to each data point is calculated; these values are then squared and summed.
This defines the sum of squares and the ‘residual’ sum of squares. The SSregression and the SSresiduals.
What is meant by signal and noise and why is this important in determining the relationship between x and y in a regression analysis?
Signal is the value of SSregression whereas noise is the value of SSresiduals; e.g. if all the data points lay on the fitted line then the noise would be 0.
These values are used to calculate a significant or non-significant relationship. A low gradient slope and large noise would indicate a non-significant relationship. A large gradient slope and little noise would indicate a significant relationship.
What are the assumptions of regression?
The residuals are normally distributed; the variance is homogenous across the range of y for all predicted values of x; the relationship is linear; there is no relationship between the residuals and y or x variables.
How do you decide to perform regression?
First eyeball the data to see if there is a linear relationship; convert the data to decipher if another format of x-axis is more applicable to regression e.g. log2. Check the residuals for normality and equal variance.
How do you check for equal variance in Y across the range of X for linear regression?
Cannot perform a Levene’s test so you must check for equal variance in y for y-values across the range of x with a scatter plot. If residuals are linear with equal variance for each predicted x value than variance is homogenous.