Lecture 18 - Resampling Statistics Flashcards

1
Q

Why were resampling statistics introducted?

A
  • Most of our stats tests are based on equations developed in 1800-1930
  • Developed by talented mathematicians calculating probabilities using maths models (on pen and paper i.e. computations had to be simple enough)
  • As a result, each test is based on one particular model of the underlying data:
  • Sometimes the model makes a lot of assumptions
  • Sometimes it makes fewer assumptions (but is usually weaker)
  • Resampling techniques represent a novel method that is assumption-free(er) but retains power (don’t need equations at the same level and don’t require assumptions but maintain power of parametric tests)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why use resampling techniques?

A

Fewer assumptions:
- So more accurate if assumptions not met
Very general:
- A few basic ideas that can be modified and reused
- No equations or tables to look up – the maths is actually easier
- Thinking about the test forces us to think about our data (what null hypothesis means), might realise the problem might be in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why are resampling approaches not more popular?

A
  • They are new (1979 is recent for stats) and assumed (incorrectly?) to be more complex
  • Parametric stats do a reasonably good job, and are discussed in simple (ish?) language in textbooks
  • Resampling does require:
  • A computer (not widely available in 1979) e.g. have to calculate a mean say 10000 times
  • Some programming (not available in SPSS)
  • A lot of people don’t like thinking about their data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are permutation tests?

A
  • One common use for resampling
  • For comparing groups/conditions (e.g. t-test replacement)
  • Shuffle data according to your conditions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is resampling used for hypothesis testing?

A
  • The point of inferential statistics:
  • To determine the probability that the differences we measured were caused by sampling error (results in sample vs rest of world)
  • The principle of resampling techniques is to measure that sampling error by repeating the sampling process a large number of times:
  • We can determine the likely error introduced by the sampling by looking at the variability in the resampling
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are between-subject randomisation tests?

A
  • Example question: are Smurfs just dwarfs painted blue?
  • Experiment: measure heights and check for difference
  • Null hypothesis: Smurfs and dwarfs have the same height
  • Would this generalise to a whole population or is it just this seven?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the process of between-subjects tests?

A
  • We want to determine the likelihood of getting differences this extreme if the data all came from a single ‘population’
  • Simulate running the experiment many times, with the data coming from one population; check what range of values commonly occur
  • In practice: keep the measured values but shuffle them (randomly assigning them to the two groups); count how often the difference between the new means is bigger than between the measures means?
  • We assume that these are real and sensible values
  • We do not assume anything about their distribution (assume numbers are valid)
  • Repeat process a large number of times
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Between subjects tests summary

A
  • Repeat simulated experiment a large number of times, forcing the null hypothesis to be true, and check how extreme the real value was
  • No equation needed, except for the statistic of interest, e.g. mean
  • No table needed: the data, themselves, give the p value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the generalisation of between-subject tests?

A
  • If our hypothesis is that the groups differ in diversity (SD) rather than the mean
  • Randomise
  • Repeat process a large number of times
  • We do not need a whole new test if we change our opinion of what is interesting in the data
  • We do not need a parametric and a non-parametric version of the test
  • Very similar approach for within-subjects design
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a within-subjects randomisation test?

A
  • H0: taking steroids has no effect on a dwarf’s height
  • Now the populations that we randomise are within subjects: in each resample, the values are shuffled for each subject, rather than across the whole dataset: we just randomise the sign of the difference for each pair
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is number of participants accounted for?

A
  • t-tests use n (number of subjects) in their equation
  • How is that accounted for here?
  • The sample size for the resamples has to be the same as the original data
  • The variance in the mean differences will automatically reflect the number of subjects
  • With 100 people (rather than 10), its unlikely one person will have a big effect (so null distribution becomes tighter)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is bootstrap resampling?

A
  • For generating confidence intervals (e.g. make error bars)
  • Resample-with-replacement the values in a sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are bootstrap resamples?

A
  • Bootstrap resamples can be used to calculate confidence intervals such as:
  • Confidence interval of a mean (e.g. 95%)
  • Standard error of the mean
  • They can also determine whether some test value is inside or outside the 95% confidence interval (like a one-sample test)
  • They can be used for confidence of simple values (like mean) or for fitted parameters (like gradient of a line)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Bootstrap example

A
  • Different type of resample – can’t just shuffle (will be same mean)
  • Everything on the left appears once, more than once or not at all (resample uses same numbers but not necessarily all of them, randomly select one replacement)
  • Resampling with replacement
  • Distributions of means of your groups (not distribution of individual people)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you calculate SEM?

A
  • SD of means (how variable would your mean have been)
  • The SEM is the standard deviation of the means of all possible samples
  • It can be estimated from the standard deviation of the bootstrap means
  • In our example:
  • SEM based on the formula: 4.27
  • SEM based on the bootstrap resamples: 4.10
  • To know the true SEM, would have to actually rerun study on full population repeatedly
  • The difference is due to the fact that they are both estimates, calculated in two different ways
  • Very skewed sample = believe bootstrap
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you calculate confidence intervals?

A
  • The 95% confidence interval from the bootstraps represents the range of values that 95% of the means take
  • It is calculated from the bootstrapped means by ordering them and cutting off the highest and lowest 2.5%
  • In our example, the 250th value is 103, the 99750th value is 119: so, the mean is 111.2 and the 95% confidence interval is 103-119
17
Q

One-sample test

A
  • Let us consider the hypothesis that this population is ‘above average IQ’. Then we might expect: IQ>100
  • Null hypothesis: IQ<=100, i.e. we want to know whether the mean IQ is significantly greater than 100
  • If H0 if true, how likely was this data? i.e. how likely is a mean of 100 or less for our population?
  • We can simply count how often it occurs within the bootstraps:
  • Order the data and find how many values are <=100
  • In our example: 45 values <= 100, so p= 45/10000 = 0.0045 < 0.05
  • Note: we could have run a one-sample t-test
  • We get: t=2.62, df=9, p=0.014 (SPSS reports 0.028 for a 2-tailed test but ours was 1-tailed)
  • Different values because two different estimates of how likely the null hypothesis is to give this data
18
Q

How do you bootstrap with a model fit?

A
  • Comparing the mean to a specific value is effectively having a very simple model of the world
  • Bootstrapping generalizes easily to more complex models than just the mean
  • E.g. we could fit a straight line to some data and use bootstrapping on the values of the gradient
19
Q

What are advantages of bootstrap?

A
  • Very general method: any type of model can be used and confidence intervals on any of its parameters can be estimated
  • Can also be used to perform hypothesis testing (for one-sample tests)
  • Not based on any assumptions about the data
  • No tables, no equations (except for the model)
20
Q

What are other resample approaches?

A

Jack-knife
- Similar to bootstrap but rather than ‘randomly sampling with replacement’, resampling is done by ‘selecting all data except one’ (find out how much impact each individual has)
- Can be done without a computer (not good reason to do with a computer as can easily do 10000 resamples with a computer)
Monte-Carlo method
- Create data based on model simulations and compare these to real data
- E.g. if a neuron has a certain spike rate and a ‘Poisson’ spike generating mechanism what is the chance of seeing a particular pattern of spikes

21
Q

What are some issues and concerns with resampling?

A
  • How many data samples (participants) do I need? = no a priori answer, try and see. Run study ahead and see effect size then see how many participants you need
  • How many resamples must I generate? = 1,000 – 10,000 depending on how accurate you want p (if you’ve not got enough, you get a different p value each time)
  • Which type of resampling should I use? = whatever best simulates the original sampling (force null hypothesis to be true and maintain as much of the original info as possible)
  • What if my data are not representative of the population? = garbage in = garbage out (same with the t-test – none of these tests can fix garbage data)
22
Q

Resampling summary

A
  • (1) Simulate recollecting the data
  • Where Null hypothesis was true to show the Null Distribution (previously calculated by statisticians)
  • From a single sample to generate confidence intervals
  • (2) If the original data don’t look likely from your Null Distribution then the Null Hypothesis is presumed not to be a good model of your data
  • (3) These tests make very few assumptions about your data (unlike parametric tests)
  • (4) They don’t throw away information (unlike rank-based non-parametric tests)
  • Maintain original power