Statistics Flashcards
Three types of statistics
1) Descriptive
2) Tests for differences
3) Tests for relationships
2+3 Subdivided into Parametric + non-parametric
Parametric - assumed normal distribution
Non parametric - no such assumption
Descriptive statistics
Measures central tendency - whats a typical value for this population?
Mean, mode, median
Measures of spread - how variable is the collected data?/ How much does individual observations differ from ‘typical’ value
SD, SE, Range, variance (sample not pop)
Variance
Population variance - not often used
Sample variance - S2 - use
n-1 instead of n
Standard Deviation
Variance has square units
SD is the square root of variance
Variance between sample values
Variance of 100mm^2 - SD = 10mm
Standard Error
SD of sample means - rather than sample values themselves
measures how good the estimate of the population mean is
Higher sample size = lower SE
SE=SD/sqrt(n/sample size)
Parametric statistics
Some tests need to assume the variable measured has a normal distribution in the population
+/- 1.96 SD from mean = 95% of observations in a normal distribution
Assumptions:
- normality
- Independence
- Homogeneity of variances
- T-test (independent samples)
- Paired T-test (repeated measures data)
- Analysis of variance (ANOVA)
- 2 way ANOVA
- Repeated measures (ANOVA)
- ANCOVA (ANOVA with covariates)
- Linear Regression
- Pearson product moment and correlation
- Spearman’s ranking
T-test
Parametric
Independent samples
Does the mean differ between two populations?
Gives t statistic - look up against degrees of freedom (DFs) (n1 + n2 - 2) in table for p-value
Incorporates all variation
Paired T-test
Parametric
Repeated measures data
For paired values (before + after study)
Concerned with differences rather than mean
Only incorporates variance between each pair of data, not inter-site
ANOVA
Parametric
analysis of variance
difference between two or more sample means?
What is the chance the samples belong to the same population
Better than multiple individual T -tests (decreased chance of false positive)
- between groups variance (F stat): var. of combining all group data together
- Within groups variance: avg. var. within each indiv. sample
ANOVA is: (between/within) ratio, large = real difference, small = random variation
F statistic compared against between groups (groups -1) and within groups (total DF -between) DFs to obtain P value
R^2 value - proportion of variance explained by dependant variable - how close the data fits the fitted regression line
Check residuals show normality on Q-Q plot (assumption of ANOVA)
Levene’s test can be applied to check homogeneity of variances (assumption of ANOVA)
Tukey post hoc test - show which varieties were different
2-way ANOVA
Parametric
ANOVA for studies with two independent variables
3 Null hypothesis:
- The population means of the first factor are equal
- The population means of the second factor are equal
- There is no interaction between the two factors
Repeated measures ANOVA
Parametric
Paired T-test for 2 or more repeated measures - are levels changing over time?
Sphericity - Equal variances across all time points - Mauchy’s W test
ANCOVA
Parametric
ANOVA with covariates
Is there a difference between populations with or without taking a specific factor into account (usually if factor cannot be controlled)
Linear Regression
Parametric
Assumed causation, allows prediction and extrapolation
Allows description of relationship between x and y as well as strength and significance
y=mx+c
Regression sum of squares (SSregression) - measure vertically from mean (y) to best fit line at each data point -> sum + square these
residual sum of squares (SSresidual, noise) - measure distance from each data point to fitted line -> sum and square these
- if all points on line, SSresidual would = 0 - therefore further away - greater noise
small slope = large residuals - non signifcance
large slope - small residuals - signifcant
Assumptions:
- Residuals are normal
- Equal variances in Y across the range of X
- Relationship is linear
- No relationship between residuals and x or y
Pearson product moment + correlation
Parametric
Doesn’t assume causation - doesn’t allow same as regression
Quantified by the correlation coefficient (r) - based on concept of covariance (multiply rather than squaring in variance)
Spearman’s ranking
Parametric
Tests statistical dependance between the rankings of two variables - avoids assumptions (more robust) of normality and homoscedasticty (variance around regression line the same for all values of predictor variable, x_
Decreased test power (sensitivity, less chance to get p<0.05)