Statistical analyses Flashcards by Kelsey Jesser

Under what circumstances are data transformations important for multivariate analyses?

If data do not have a uniform scale

How well did you know this?

Not at all

Perfectly

How do you deal with qualitative variables in multivariate analyses

Give them a numerical value

example: seasonality, use four seperate variables, designate absence or presence of that season as a 0 or 1

How well did you know this?

Not at all

Perfectly

What does standardization do? (2)

used to remove influences of magnitude difference

results in dimensionless variables

How well did you know this?

Not at all

Perfectly

What is a z-score and how is it calculated? (2 steps)

Used to standardize data;

take difference between the value and mean of the variable
divide by the stdev of the variable

How well did you know this?

Not at all

Perfectly

What is the difference between an object and a variable?

Object: categories of data (samples, sites, time periods, etc.)

Variable: measured value for each object

How well did you know this?

Not at all

Perfectly

What does normalization do?

Corrects distribution shapes of variables that depart from normality, tries to obtain homogenous variances for variables for better multivariate analyses

How well did you know this?

Not at all

Perfectly

What transformation can be done on data with a lot of zeroes?

Hellinger transformation

How well did you know this?

Not at all

Perfectly

What are exploratory multivariate analyses?

multivariate analyses that are used to reveal patterns in large datasets, but do not explain why those patterns exist

How well did you know this?

Not at all

Perfectly

What does a cluster analysis do?

Minimizes w/in group variation, maximizes between group variation (reduces the dimensionality of the dataset to a few groups of objects)

How well did you know this?

Not at all

Perfectly

Under what circumstances might a cluster analysis be useful?

When distinct discontinuities are expected

How well did you know this?

Not at all

Perfectly

What are the two steps for a cluster analysis?

use a relevant association coefficienct to caluclate a dissimilarity/similarity matrix between or among objects/variables.
represent association matrix as a tree (heirarchical clustering) or as groups of objects (k-means clustering)

How well did you know this?

Not at all

Perfectly

What types of linkage rules are generally used to form martrices for heirarchical clustering? (3)

nearest neighbor- distance between two clusters is equal to the distance between their CLOSEST neighboring points
further neighbor: distance between 2 clusters is equal to the distance between their two furthest objects
UPGMA: distance between 2 clusters is equal to the avg. distance between all inter-cluster pairs

How well did you know this?

Not at all

Perfectly

How does k-means clustering work?

objects are clustered into k (defined in advance) number of clusters based on their nearest Euclidan distance to the mean of clusters

How well did you know this?

Not at all

Perfectly

What is one advantage and one disadvantage of using k-means clustering?

advantage: don’t need a similiarity matrix
disadvantage: sensitive to outliers

How well did you know this?

Not at all

Perfectly

What is a PCA?

Priciple componet analysis

How well did you know this?

Not at all

Perfectly

What does a PCA do?

Calculates new synthetic variables (principle components) using linear combinations of the original variables to account for as much variability as possible

How well did you know this?

Not at all

Perfectly

What kind of matrix is used for PCA when all data points have the same units (ex: species abundance)?

Variance-covariance matrix

How well did you know this?

Not at all

Perfectly

What kind of matrix is used for PCA when data points have the different units?

correlation matrix, variables must be standardized so that distances are independent of original scales

How well did you know this?

Not at all

Perfectly

What are the dots on a PCA ordination?

Objects

How well did you know this?

Not at all

Perfectly

What are the vectors on a PCA ordination and what do they mean?

Variables

Vector direction indicates greatest change, rate may indicate rate of change

How well did you know this?

Not at all

Perfectly

Under what conditions should a PCA be used?

Good when looking at linear responses across short gradients (otherwise CA, NMDS are better)

How well did you know this?

Not at all

Perfectly

What is an eigenvalue?

value denoting how much variance is explained by a given principle component.

How well did you know this?

Not at all

Perfectly

When is an eigenvalue considered significant?

If its value is greater than the average of all eigenvalues

How well did you know this?

Not at all

Perfectly

Why are correlations between principle components and original variables not statistically valid in terms of describing which variables contribute most to variation observed in a PCA ordination?

components and variables are already linearly correlated and are not independent of one another

How well did you know this?

Not at all

Perfectly

What does PCoA stand for?

Principle coordinate analysis

How is PCoA different from PCA?

Works with any dissimiliarity measure-- can pick the association coefficient that works best for your data

Why are componets more difficult to interpret for PCoA than for PCA?

There is no direct link between componets because PCoA components are complex functions of variables depending on the matrix coefficient used to form the matrix; can still correlate variables with axes (but not stat. sig)

What does NMDS stand for?

non-parametric multidimensional scaling

What analysis is good for identifying underlying gradients and representing relationships based on various distance measures?

NMDS

How does NMDS work?

Ranks distances between objects (matrix), uses those ranks to map them non-linearly in ordination so as to preserve their ranks with the least amount of "stress"; proximity between objects corresponds to their similiarity.

How does NMDS calculate stress?

Goes through several iterations of ordinations to indentify the lowest stress based on comparisons to original distances between samples

How do you interpret stress values for NMDS?

>0.3 stress is poor representation of the data

Which two statistical tests are usually used for multivariate data?

NPMANOVA, ANOSIM

What is NPMANOVA used for?

used to test for significance between at least two sets of multivariate, quantitative data

What is the null hypothesis for NPMANOVA for multivariate data and how is it tested?

null hypothesis of equality between datasets is tested using Wilke's lambda; can then use another post-hoc test to assess the significance of pairwise comparisons.

What is ANOSIM used for?

Can test for significance based on any distance measure, compares ranks of distance in and between groups

What statistic do you get after running an ANOSIM and how is it interpreted?

R statistic, measures whether separation is found (R=1) or no separation occurs (R=0) R>0.75 considered well-separated statistically R> 0.5 considered separated but overlapping R

What "indirect" gradient analyses can be used to assess environmental gradients for PCA and CA?

ANOVA

What "indirect" gradient analyses can be used to assess environmental gradients for PCoA and NMDS?

Spearman Rank Correlations

What are some other "indirect" gradient analyses? (2)

Run linear regression of variables onto existing ordination (done in R) Can also use site symbols where size is proportional to the environmental value (good for NMDS)

What is a constrained (canonical) ordination?

A direct gradient analysis where only the variation that can be explained by the environmental variables (provided in a seperate table) is displayed in the ordination

In Constrained (canonical) ordination, species abundance is usually considered a ______.

response

Constrained (canonical ordination) is usually based on _______ ______ _______ that related axes to environmental variables

multivariate linear models

Redudundancy analysis (RDA) is another _________ analysis for environmental gradients.

Direct

RDA is an extension of _____, where components are constrained to be linear combinations of environmental variables

PCA

How does RDA "explain" variation between independent and dependent variables?

uses multiple linear regression, get correlation coefficients between each species and each environmental variable

______ is similar to RDA, but uses unimodal species-environment relationships.

Canonical correspondence analysis

How is a Mantel test used?

To compare to matrices; calculates correlation coefficients between corresponding matrix positions.

What are two common diversity indices?

Shannon and Simplot

What types of diversity can be calculated based on Simplot diversity? (3)

alpha: within sample diversity beta: between sample diversity gamma: landscape scale diversity

What is a key difference between CA and PCA?

CA is used for categorical rather than continuous data

For CA, all data should be on the ____ scale and ________.

same; non-negative

CA decomposed the chi squared statistica associated with a table into __________ __________.

Orthogonal factors

What is Euclidean distance?

Derived from the Pythagorean theorem, is just the "ordinary" distance between two points in 3D Euclidean space

Bray-Curtis dissimilarity is used to:

quantify the compositional dissimilarity between sites based on counts at each site.

ANOVA stands for:

Analysis of variance

What are three assumptions of an ANOVA?

normality, independence, homogeneity of variance

What is an ANOVA used for?

to analyze differences amoung group means and associated procedures (including variance)

What is a MANOVA?

A mulivariate ANOVA (ANOVA with several dependent variables)

What dissimiliarity metric is chi squared distance based on?

Euclidean distance

What statistic is used by MANOVA?

Wilk's lambda, a multivariate generalization of the F-distribution used in univariate analyses

What statistic is used by ANOVA?

F distribution, describes the distribution of the test statistic when the null hypothesis is false

What is Euclidean distance?

Distance between samples in 3D space, also called Pythagorean distance

What is Bray-Curtis dissimilarity? How can the Bray-Curtis similarity be calculated?

A common ecological metric for determining the dissimilarity between two sites based on counts at both sites; Bray-Curtis similarity can be calculated by subtracting the Bray-Curtis dissimilarity from 100.

When should a Kruskal-Wallis test be performed/used?

When you have one nominal variable and one ranked or scaled variable; is non-parametric so can be used in the place of a one-way ANOVA when data is not normally distributed

What type of data can a Mann-Whitney U test be performed on?

Two sets of independent non-parametric data (not normally distributed, variance is unequal)

What does the Mann-Whitney U test test?

Whether two independent groups of samples come from the same distribution; non-parametric version of the t-test, based on rank abundances

What is the null hypothesis of a Kruskal-Wallis test?

That the mean (or mean rank) of the two groups is the same

Statistical analyses Flashcards

(68 cards)