BIO Statistics Flashcards
Central limit theorem
The sampling distribution of the mean of any independent, random variable will be normal, or nearly so, if the size of the sample is large enough.
Gaussian curve: area between u and 1SD, 1 SD and 2 SD, 2 SD and 3 SD, 3SD–> infinity
U and 1SD: 34.1%
1SD - 2SD: 13.6%
2SD-3SD: 2.1%
Past 3 SD: 0.1%
Parametric statistics (definition)
A class of statistical procedures relying on the assumptions about the shape of the distribution(assume normal), in the population and about the form or parameters (u, SD) of the assumed distribution.
Non parametric statistics (definition)
A class of statistical procedures NOT relying on assumptions about the shape or form of the probability distribution from which the data is drawn. `
Descriptive statistics include
Mean, median, mode, range, variance, SD, SE
Range
Difference between largest and smallest sample values
Not indicative of the data set’s dispersion
Variance
Average of the square distance of each value from the mean.
Includes negative values
Standard deviation
Tells you how tightly each sample is clustered around the mean.
Tight cluster=low SD.
Only under normal distribution.
Shows precision of the calculated mean
Standard error
Measure of how far the sample mean is from the population mean.
Gets smaller as sample size increases, since the mean of a larger sample is likely to be closer to the population mean.
Confidence interval (definition)
The estimate of the range that is likely to contain the true population mean. Takes into account the size of the population and the scatter of the measurements.
What constitutes reliable data?
Precise, accurate, repeatable, reproduce able.
Random error
Caused by inherently unpredictable fluctuations on the readings of the measurement apparatus or in the experimenter’s interpretation of instrumental reading.
Can occur in any direction
Systematic error
Result of bad science. Predictable, one direction. Caused by imperfect calibration of instruments, imperfect methods.
Alpha
Significance level. Probability threshold below which the H0 will be rejected.
0.05 or 0.01 are appropriate.
Type 1 error
Incorrect rejection of a true Ho. (False positive)
Say the experiment worked when it didn’t
Type II error
Incorrectly retaining a false Ho. (False negative)
If the true state of the Ho is false and you fail to reject it. Usually an issue with power.
Z Test definition
Any statistical test for which the distribution of the test can be approximated by a normal distribution, with n>30.
Assumes pop and sample are normally distributed.
What does the value of Z mean in a z test?
Z is the chance that the experimental mean would occur by chance, given that the Ho is true. Large Z means that there’s less of a chance this is true.
Z score of 2.5 means that the sample mean is 2.5 SD away from the population mean.
T test is used when (general)
You have a normal distribution in the population and the sample, and have n
P value– what do large and small p mean
Large p indicates weak evidence against the Ho. Need to accept.
Small p indicates strong evidence against the Ho, reject.
One tailed t test
To test if the experimental mean is significantly greater than the population mean, or significantly less than, but not both.
Making the assumption about the data makes this less robust
Two tailed t test.
Testing if the exp. mean is significantly greater than and significantly less than pop mean.
More robust because using a smaller area on each side of the distribution (2.5% on each)
Paired t test
The observed data are from the same subject, twins, or otherwise matched subject and are drawn from a population with a normal distribution
Unpaired t test
Observed data are from two independent, random samples from a. Population with a normal distribution.
ANOVA
Compares 3 or more means. Measures the sum of squares to understand the variance.
ANOVA tells you whether any of the earns have a difference between each other, taking scatter and variability into consideration.
One way ANOVA
One measurement variable and one nominal variable is explored.
All the groups are independent, and only one thing is being measured in each group. There is theoretically a normal distribution within each group.
Two way ANOVA
1 measurement variable and 2 nominal variables.
There are two factors being measured within each group that effect the outcome. Ex: how 3 different drugs affect subjects - both men and women. Drug response and gender are the two factors.
Post hoc tests
In follow up to the ANOVA. Used when ANOVA rejects Ho. Tests whether the group means differ significantly, correcting for multiple comparisons.
Mann Whitney U test
For independent measures with 2 groups. It’s a non-parametric two sample t test.
Ranks measurements from highest to lowest values, separating the groups– U from each sample set. Lowest U is compared to the table. If Uexp
Correlation
The extent to which two variables have a linear relationship with each other.