Statistical analyses Flashcards
Under what circumstances are data transformations important for multivariate analyses?
If data do not have a uniform scale
How do you deal with qualitative variables in multivariate analyses
Give them a numerical value
example: seasonality, use four seperate variables, designate absence or presence of that season as a 0 or 1
What does standardization do? (2)
used to remove influences of magnitude difference
results in dimensionless variables
What is a z-score and how is it calculated? (2 steps)
Used to standardize data;
- take difference between the value and mean of the variable
- divide by the stdev of the variable
What is the difference between an object and a variable?
Object: categories of data (samples, sites, time periods, etc.)
Variable: measured value for each object
What does normalization do?
Corrects distribution shapes of variables that depart from normality, tries to obtain homogenous variances for variables for better multivariate analyses
What transformation can be done on data with a lot of zeroes?
Hellinger transformation
What are exploratory multivariate analyses?
multivariate analyses that are used to reveal patterns in large datasets, but do not explain why those patterns exist
What does a cluster analysis do?
Minimizes w/in group variation, maximizes between group variation (reduces the dimensionality of the dataset to a few groups of objects)
Under what circumstances might a cluster analysis be useful?
When distinct discontinuities are expected
What are the two steps for a cluster analysis?
- use a relevant association coefficienct to caluclate a dissimilarity/similarity matrix between or among objects/variables.
- represent association matrix as a tree (heirarchical clustering) or as groups of objects (k-means clustering)
What types of linkage rules are generally used to form martrices for heirarchical clustering? (3)
- nearest neighbor- distance between two clusters is equal to the distance between their CLOSEST neighboring points
- further neighbor: distance between 2 clusters is equal to the distance between their two furthest objects
- UPGMA: distance between 2 clusters is equal to the avg. distance between all inter-cluster pairs
How does k-means clustering work?
objects are clustered into k (defined in advance) number of clusters based on their nearest Euclidan distance to the mean of clusters
What is one advantage and one disadvantage of using k-means clustering?
advantage: don’t need a similiarity matrix
disadvantage: sensitive to outliers
What is a PCA?
Priciple componet analysis
What does a PCA do?
Calculates new synthetic variables (principle components) using linear combinations of the original variables to account for as much variability as possible
What kind of matrix is used for PCA when all data points have the same units (ex: species abundance)?
Variance-covariance matrix
What kind of matrix is used for PCA when data points have the different units?
correlation matrix, variables must be standardized so that distances are independent of original scales
What are the dots on a PCA ordination?
Objects
What are the vectors on a PCA ordination and what do they mean?
Variables
Vector direction indicates greatest change, rate may indicate rate of change
Under what conditions should a PCA be used?
Good when looking at linear responses across short gradients (otherwise CA, NMDS are better)
What is an eigenvalue?
value denoting how much variance is explained by a given principle component.
When is an eigenvalue considered significant?
If its value is greater than the average of all eigenvalues
Why are correlations between principle components and original variables not statistically valid in terms of describing which variables contribute most to variation observed in a PCA ordination?
components and variables are already linearly correlated and are not independent of one another
What does PCoA stand for?
Principle coordinate analysis
How is PCoA different from PCA?
Works with any dissimiliarity measure– can pick the association coefficient that works best for your data
Why are componets more difficult to interpret for PCoA than for PCA?
There is no direct link between componets because PCoA components are complex functions of variables depending on the matrix coefficient used to form the matrix; can still correlate variables with axes (but not stat. sig)