Study Design: From group comparisons to individual predictions Flashcards
What are the 2 defining properties of Gaussian Distribution?
- The middle of the distribution mirrored from left to right - denoted by mew
- The standard deviation which defines essentially the width of the curve and uses the Greek letter sigma
What is statistical significance?
- The Gaussian distribution has tails which go to +/- infinity - there is a finite probability of extremely unlikely events (although the statistical models may break down at the extremes)
- With the Gaussian distribution no result is impossible. Instead, we set a limit of statistical significance, usually at p=0.05
- For a Gaussian distribution, this occurs at z=1.96 I.e. if a result is more than 2 standard deviation from the mean, it’s likely to be significantly different from the mean
- we observe values that are extreme
- For a normal standard distribution, we talk about z score - mean zero and standard deviation 1
What is the Null Hypothesis?
- Compute a probability of everything on the left side
- The p-value is the area under the whole curve
- The 5% cut-off = 1.96
- To rest for statistical significance, we set up a Null hypothesis H0. This is the opposite to the hypothesis that we wish to test
- If the value is 2 standard deviation away from the mean of distribution, then we can call it statistically significant - essentially p value is too small
- The p-value is the probability of obtaining the observed result (or greater) If the null hypothesis is true
What does the P-value >0.05 and <0.05 mean?
- If P-value > 0.05, the probability of obtaining this result is greater than 1/20 or 5% so we assume cannot reject the Null hypothesis
- If the P-value < 0.05, the probability of obtaining this result is < 1/20 or % so we say that the result is unlikely to have obtained if the Null hypothesis is true, so we can reject the Null hypothesis
What is an arbitrary threshold with no real meaning?
5%
What can be very misleading?
P-values and the significance test
What is common to ask in medical statistics?
Whether two groups are different
E.g. we might try a new treatment for headache on a group of people and compare the outcome to a group who received the placebo. We want to know if the groups are significantly different (the treatment may make the headache worse)
What is the most commonly used test?
Students t-test
What is Student’s t-test based on?
Student’s t-distribution:
- “heavier tail” than the the normal distribution
- has one parameter: df (degrees of freedom)
- mean 0
What is students t-test I?
Students t-test is commonly used to compare two samples, with me and standard deviation and number of measurements respectively. Then we need an estimate of the combined standard error of mean
What is students t-test II?
Strictly, both distribution should be Gaussian and have the same standard deviation, but student t-test is robust and works well for non-Gaussian distributions
Note: Welch’s test is more general (and implemented in more statistics software)
What is Paired t-test?
Used with matched samples, e.g. participants before and after treatment
- used to compare the means
What is longitudinal data?
Observing people over time
Conceptually, longitudinal imaging data allows to conduct ‘paired tests’
Gain in statistical power
- uses the baseline data as a reference for each subject
- less susceptible to noise between subjects
- can be combined with a reference group or intervention etc
What is longitudinal imaging data?
- Co-registration and substation of longitudinal volumetric scans
- Serial hippocampal volumetric
- boundary shift integral - looking at defining boundaries in terms of the hippocampus
- manually hand traced
What are the problems with longitudinal imaging data?
- Takes a long time to acquire the data
- The scanners can break
- The study is longer