lec 4 /5 /6/7 Flashcards
What are CI?
suggest that 95% of sample CI would be expected to include true population mean
What do we use for computer based random sampling
monte carle methods
sampling without replacement
permutation
sampling with replacement and what is this?
Bootstrapping - values can be picked at random more than once
what is a residual?
the diff between the observed value of the dv and the predicted value. Each data point has one residual, it is the error/unexplained variation
what is the line of best fit/
finding a slope and intercept that minimise variation of data around the line (in other words minimise the residuals)
how to calc total vairance
variation predicted by x + unexplained variation
if x doesnt predict y, what is the total variance?
unexplained variation by x
if x predicts y well, what is the total variance?
variation predicted by x
When to use reduced major axis or major axis?
if both x and y have error and the ‘true’ relationship is of interest (they must be correlated)
MA if variances similar
Reduced Major Axis (RMA) Regression
if variances unequal
what to use to predict y from x?
Ordinary Least Squares - this requires norm distribution of residuals not x
what to do with non-linear relationship?
transform to data to linear
fit functions using maximum likelihood
or polynomial regression.
which contrast does this desribe? ‘if squared, values will be independent’
polynomial
why is a balanced design good?
it is orthogonal and so increases power
assumption of mixed model is that
repeated measures must be uncorrelated
Why might you chose to use monte Carle methods over non parametric ?
When data is skewed and the two have diff shapes, wilcoxin / Mann Whitney u assume they have the same shape
Permutation test
Assumes that if groups aren’t different, then group membership should be arbitrary, therefore re labels groups lots and lots of times and this shouldn’t make a difference to group means or medians
What is the p value in permutation test?
The probability of the observed mean if the null hypothesis is true
Why would you use permutation over para or non para? What are the advantages?
Not limited to measures of location like mean and median, can use any measure to do calculations on probability from the sample
Pros and cons of permutation test
Good - no distribution assumptions, can be customised for any problem, can see where p value comes from
Bad - unfamiliar to referees, sums can’t be readily checked, need to be customised
How to do permutation on anova type (multi group differences)
Can use the average difference between group means/medians
Or can use F (total variance - within group variance / within group variance) - f usually requires norm dist but not in this case
Can also use order of means and then find the probability that this order occurs by chance
Wilcoxin is distribution free but not ..
Assumption free
If data can’t take values <0 what do we do? Eg number of frogs in a pond.. not sensible to transform
Estimate the mean and ci as this would be more useful to see where population will lie, can use maximum likelihood - pick known distribution that matches sample and estimate mean/median and ci
(The sample mean is a maximum likelihood estimate of population mean)
Can still do this if distribution is not normal! Stats package does for you
or if distribution isn’t like any off the shelf distribution use bootstrapping (but this does require large sample, not so good for hypothesis testing)
What are CI?
95% of sample CI will include the true population mean
Permutation vs bootstrapping
Permutation tests a shuffle of the sample,
Bootstrapping re-samples with replacement
How large does sample need to be for bootstrapping?
> 50
what is centering?
when there is multiple colinearity in polynomial regression so Use (x – mean(x))2 so there is no multiple colinearity
why do reg and ANOVA get diff results when there is more than 2 groups?
If we have more than two groups, then the regression and the ANOVA will yield different results: regression will fit a straight line through all three groups (and thus not necessarily joining the means), whereas ANOVA fits separate means to each group.
ANOVA tests for significance of the two differences between the three means
Regression tests for significance of a single line fitted through all the data
what is ordinary least squares?
is a method for estimating the unknown parameters with the goal of minimizing the sum of the squares of the differences between the observed responses and the predicted