Stats Flashcards
What are bar charts good for?
Counts and proportions.
Eg. Left/right/ambidextrous.
What are box plots good for?
Multiple numerical samples in different groups.
Eg. Speed of spiders with one and two palps.
What are histograms good for?
Counts of numerical observations.
Eg. Frequency/forest density.
What are scatter graphs good for?
Showing the relationship between 2 numerical variables.
What are maps good for?
Showing geographical relationships.
What are coxcomb/rose/polar area charts good for?
Showing cyclical changes in the frequency of categorical variables.
Eg. Causes of death for each month.
What is the difference between accuracy and precision?
Accuracy = sample values are close to the actual value. Precision = sample values are tightly grouped and highly repeatable.
What are summary statistics?
Describe averages (central tendency): Mean. Median. Mode. Describe data (proportions): Counts. Percentages. Describe variation: Variance. SD. ^^distribution around mean. SE. CI. ^^accuracy of mean.
What are confounding variables?
Unmeasured variables that change in tandem with one or more of the measured variables, giving the appearance of a causal relationship (spurious).
What is standard deviation (S)?
A measure of spread around the mean.
Square root of variance.
What is variance (S^2)?
N-1 (the df)
How is df calculated?
Sample size - number of parameters.
What is the standard error (SE)?
A measure of how precise the estimate of the mean is.
= S/sqroot n
What is the 95% confidence interval (CI) for the mean?
The range of values around the estimated mean which is likely to contain the true population mean.
Upper 95% CI = x + 1.96SE.
Lower 95% CI = x - 1.96SE.
X = sample mean.
When do conversion equations between variance SD SE CI not apply?
When data isn’t normally distributed.
Which tests assume normally distributed data?
T test. F test. ANOVA. Pearson's correlation. Linear regression.
What are negative and positive skew?
Graph leans to right and leans to left.
Which plots show normal distribution?
Histogram and quantile-quantile plots.
What does p<0.05 mean?
Reject null hypothesis.
What is the null hypothesis of the Shapiro will test?
That the data are normally distributed.
What is the null hypothesis of the T test?
The means of the 2 groups are the same.
What are the null hypothesis and assumptions of a one sample T test?
H0: the mean of the sample is equal to the population mean.
Assumes:
Data are a random sample.
Data are normally distributed.
What is the t statistic?
(Sample mean - population mean) / sample SE.
When are samples independent?
When the probability of one is unrelated to the probability of the other.
How do you eliminate experimenter bias?
Randomly assign experimental units to treatments.
Blind procedures.
What is pseudoreplication?
Looks like replication but units are not independent.
What is blocking?
The grouping of experimental units that have similar properties. Within each block, treatments are randomly assigned.
What is matching?
When every individual in the treatment group is paired with a control individual with the same/similar values for suspected confounding variables, e.g. twin studies.
What are the null hypothesis and assumptions of Welch’s T test?
H0: The means of the two groups are equal. Assumes: Random samples. Normally distributed.
What are type 1 errors?
False discovery rate - the proportion of tests where we reject the null hypothesis when it’s true (0.05).
What is power?
The proportion of tests where we reject the null hypothesis when it's false (0.8). Depends on: Sample size. Effect size. Variation in data.
What are type 2 errors?
The proportion of tests where we fail to reject the hull hypothesis when it’s false (0.2).
What are the null hypothesis and assumptions of a paired T test?
H0: The mean difference between pairs is equal to a specified value. Assumes: Random sample. Differences are normally distributed.
What are non-parametric tests?
Used when data isn’t normally distributed.
Fewer assumptions but less power.
What is the sign test?
Non-parametric alternative to one sample and paired T tests.
Identical to binomial test (hits = positive differences, trials = sample size).
H0:
The median difference between pairs is 0 (probability of success is 0.5).
Assumes:
Random sample.
What is the Mann-Whitney U test/Wilcoxon rank-sum test?
Non-parametric alternative to Welch’s T test.
H0:
The distributions of the 2 samples are the same.
Assumes:
Random sample.
Not a test of differences between medians or means.
What is the Wilcoxon signed rank test?
H0:
The distributions of the 2 samples are the same.
Assumes:
Random sample.
The distribution is symmetrical around the median.
Not a test of differences between medians/means.
What is ANOVA?
Compares the means of 3 or more groups. H0: there is no difference between the means. Assumes: Random samples. Normally distributed. Equal variance.
What is the Kruskal-Wallis rank sum test?
Non-parametric alternative to ANOVA.
H0: the distributions of the samples are the same.
Assumes:
Random samples.
Can test differences between medians/means if distributions are the same.
What are planned comparisons?
Specific comparisons planned during study design, eg comparing treatments to a control group - Dunnett’s test.
What are unplanned comparisons?
Comparisons not planned during study design because no prior reason to compare any particular groups, e.g. comparing all groups to each other - Tukey-Kramer.
Why are multiple tests bad?
Inflate false discovery rate - for 1 test = 0.05, for 2 = 0.1…
So 1 ANOVA better than 3 T tests.
What is a 2 way ANOVA?
Assesses the effects of 2 different categorical independent variables, e.g. does mean weight differ with species and sex?
H0: there is no difference between the means.
What is an ANCOVA?
Assesses the effects of a categorical independent variable and a numerical independent variable, e.g. does time to fall asleep differ with experimental treatment and age?
What are the assumptions of a 2 way ANOVA and ANCOVA?
Random samples.
Normal distribution of errors (in a 1-way ANOVA normal samples = normal errors, not the case for other ANOVAs).
Equal variance in all samples.
How is goodness of fit measured?
R^2:
ANOVA (how much variation in the dependent variable is attributable to the independent variable)
and regressions (how much change in 1 variable is attributable to the other).
Chi-squared:
Categorical data.
How do you check if the data meet the assumptions of the model?
Plot residuals (errors) against fitted values (means or predicted values).
What are balanced and unbalanced designs?
Balanced = all treatments have the same number of samples.
Unbalanced: treatments have different numbers of samples.
What is the binomial test?
H0: the frequency of successes matches the predicted frequency of successes (eg 0.5). Requires: Number of successes. Number of trials. Predicted probability of success.
What is a Pearson’s chi-squared test?
Test for association between 2 categorical variables, e.g. sex and species.
H0: there is no association between the variables.
Assumes:
Random samples.
No expected values less than 1.
No more than 20% of expected values less than 5.
*Also chi squared goodness of fit test (are data distributed between categories as predicted?).
When would you use Fisher’s exact test?
When you have a 2x2 square and don’t meet the assumptions of a chi-squared test.
H0: there is no association between 2 variables, each with 2 categories.
Assumes:
Random samples.
What is the chi-squared goodness of fit test?
H0: individuals are distributed between categories in a specified proportion.
Need categories and expected proportions.
Assumes:
Same as Pearson’s chi-squared.
What is a correlation coefficient?
Measures the correlation between 2 numerical variables.
What is Pearson’s correlation coefficient?
H0: there is no correlation between the 2 variables.
Need 2 continuous variables (e.g. Tail length and body length).
Assumes:
Random samples.
Measurements have a bivariate normal distribution.
What is bivariate normal distribution?
- Relationship between x and y is linear.
- The points in a scatterplot are circular or elliptical.
- Both x and y are normally distributed.
What is Pearson’s product-moment correlation coefficient?
H0: there is no correlation between the 2 variables. Need 2 continuous variables. Assumes: Random samples. Bivariate normal distribution.
How is R^2 calculated in a regression analysis?
The square of Pearson’s correlation coefficient.
What is Spearman’s rank correlation?
H0: there is no correlation between the ranks of 2 variables.
Need 2 continuous variables.
Assumes:
Random samples.
What is the difference between correlation coefficients and linear models?
Correlation coefficients:
1. Can’t make predictions.
2. Observational only.
Linear models:
1. Fits a line.
2. X and y are not interchangeable.
3. It is assumed that x determines y in some way.
4. If we know x, we can make a prediction for y.
5. X doesn’t need to be normally distributed.
6. Experimental and observational.
What is a linear regression?
H0: the slope of the relationship between the 2 variables is 0.
Need 2 continuous variables.
Assumes:
Random samples.
There is a linear relationship between the variables.
Homoscedasticity (variance in errors constant across x values).
Y values are normally distributed for each x value.
What is regression to the mean?
When 2 measurements are taken and 1 is extreme (far from the mean) the other on average will be closer to the mean. This movement towards the mean can look like response to a treatment - regression fallacy.
What is a random sample?
Where each member of the population has an independent and equal chance of being selected.
What if samples aren’t independent?
- Use an average of related samples.
- Use a mixed effects model - can only be used for ANOVAs/linear models, splits variables into fixed effects (effects you’re interested in) and random effects (effects the dependent but you don’t care how).
What is the poisson distribution?
Common for discrete variables - counting things in time/space.
Clumped dispersion = mean>variance.
Random dispersion = mean=variance.
Regular dispersion = mean
What is the ANOVA F statistic?
MSgroups/MSerror.