lecture 9 - within subjects t-tests additional considerations Flashcards
Assumptions of a within-participant t-test - Random and independent samples
assumptions need to be satisfied otherwise invalid and invalidates conclusion
Normally distributed “something or other” (distribution of sample means according to null hypothesis) (Field)
Formally it’s that the sampling distribution of the means (the mean difference scores) is approximately normal….
If n is large (n>30 or so) this is very likely to be reasonably true (thanks to the central limit theorem) - if sample is large - greater than 30 - the central limit theorem says that the assumption of normal distribution is likely to have been satisfied.
If n is small, then look at distribution of the data themselves (e.g. a histogram). If it looks fairly normal, you’re probably ok (unless people’s lives are at stake…). But if not (e.g. it’s strongly asymmetric or not very “mound-shaped”) … worry…. Worry more for really small n…. if sample is small (less than 20) - worry about assumption and do some checks on normality eg plot a histogram to see if normally distributed also histograms make it easy to see outliers- if skewed then need to look if satisfied assumption of normality. if there are outliers - that’s a problem for t-tests as mean is very sensitise to outliers used in a t-test.
Fortunately there are lots of checks you can do as well as different solutions to address this problem (more on those later).
A final worry/”assumption”:
Check your data for outliers, e.g. extreme data points that are a long way from most of the data. Think hard if you’ve got extreme outliers…. worry…. Talk to Field….
independence is important
as looking at other in experiment eg friends answers - your answers new now influenced by them so answers are not independent from each other.
random sampling
if your trying to draw a conclusion about a population and that means that at least conceptually you need to get all members of that population in some way. in practice rarely true as population usually = human race
samples - more constrained they just need to be representative of the population
practical benefit of the central-limit theorem
Means taken from skewed (or otherwise non-normally distributed data) are normally distributed.
So, tests based on means are reasonably robust to departure from normality (as long as sample size is big enough).
wilcoxon matched paired test assumption
it doesn’t assume normality - it assumes random and independent sampling so it can be better to do a non parametric test as its assumption is weaker and doesn’t assume normality
assessing normality - QQ plots
examples in notes
systematic deviation for the blue diagonal line suggests data isn’t normally distributed.
a QQ plot - plots the distribution of the data itself against a normal distribution - you can form an impression of what ways the data is not normal.
you ask does your data mainly fall on the diagonal line or are there systematic deviation of the data from the diagonal line. various kinds of deviations correspond to various patterns on the qq plot.
postive skew = curved line falling below diagonal line
negative skew = curved line falling below the diagonal
what does field textbook say about normality
SPSS will do a test for normality called the Shapiro-Wilk test: If it is significant, that means the data are “significantly” ???? non-normal…..
If the number of data points is large (> 30 or maybe > 60), a “significant violation” probably doesn’t matter as it will very likely be small….
If the number of data points is small (<30’ish), a significant Shapiro-Wilk test is a strong indication that the normality assumption probably is NOT satisfied…. so worry … as the central limit theorem doesn’t promise normality under those circumstances and the test itself say that you haven’t satisfied it.
What about Zhang et al.? Why didn’t they do Wilcoxon’s everywhere?
However, a nonsignificant Shapiro-Wilk test is not a good indication there isn’t a problem because the test is not very powerful for small n….. So worry…. Look at histograms, etc.
practical research advice for the normality assumption
Always look at histograms (and QQ plots) as they also help spot outliers
If you have a lot of data, you’re probably ok
If you have a little data, worry (maybe about bootstrapping….) consider a nonparametric test, maybe get more data….
bootstrapping
Bootstrapping is one possible solution to a normality problem, and Field spells out the details. The intuition is instead of using all the shelf distributions like the z-distribution of the t-distribution, to pretend the shape of the population distribution is exactly the same as the sample distribution (e.g. the sample might be positively skewed). You randomly take a sample from that population (using a computer), and calculate a sample statistic. You then do this over and over a plot the distribution of the sample statistic. You then can assess your original sample statistic relative to this bootstrapped distribution without worrying about normality….
The shape of the t distribution is determined by this df.
For low df’s the t distribution is a bit like a “squashed” z distribution.
As the dfs get bigger the t distribution looks more and more like the z distribution….
when critical value for t + df = small then value of critical value t needs to be further out into the tails to make 5% of the distribution than the normal distribution does. when the df get larger then the distribution becomes more and more like the normal distribution.
Null hypothesis for one- and two-tailed tests.
one-tailed/ directional - Your hypothesis must
be that there will be an effect in one particular direction only
two tails - The critical value
is larger, so it is
harder to reach
significance.
One-tailed vs. two-tailed tests
There are those who say never do a one-tailed test.
If you are in any doubt, follow that advice!
Avoid one-tailed tests unless -
You are worried that you will fail to get a significant result for a small, but real, effect (perhaps N is limited) and the result in the opposite tail you’re ignoring isn’t meaningful.
At minimum, you have decided in advance the expected direction of the effect (which mean will be higher), according to your hypothesis.
confidence intervals (CI)
A confidence interval gives a range of plausible mean differences
Even better, making errors bars a confidence interval visually implies statistical significance:
The fact that the 95% CI does not include 0 here directly corresponds to a statistically significant difference, p < 0.05.
some observations on the formulae
ways to make t big, i.e. increase power, so it is more likely to be significant…
t = D-bar / sm
Make the manipulation stronger: The bigger the difference, 𝐷̄, the larger t is.
Collect more and less noisy data: The smaller the standard error, sM, the larger t is.
Sm = s/√N
So the smaller the standard deviation s and the bigger N, the smaller the standard error sM and the larger t.
The tables for t only cover positive values, but negative ones can be significant too. So, simply use the absolute value of t unless you are doing a 1-tailed test.