Statistical analyses Flashcards

1
Q

Under what circumstances are data transformations important for multivariate analyses?

A

If data do not have a uniform scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you deal with qualitative variables in multivariate analyses

A

Give them a numerical value

example: seasonality, use four seperate variables, designate absence or presence of that season as a 0 or 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does standardization do? (2)

A

used to remove influences of magnitude difference

results in dimensionless variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a z-score and how is it calculated? (2 steps)

A

Used to standardize data;

  1. take difference between the value and mean of the variable
  2. divide by the stdev of the variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between an object and a variable?

A

Object: categories of data (samples, sites, time periods, etc.)

Variable: measured value for each object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does normalization do?

A

Corrects distribution shapes of variables that depart from normality, tries to obtain homogenous variances for variables for better multivariate analyses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What transformation can be done on data with a lot of zeroes?

A

Hellinger transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are exploratory multivariate analyses?

A

multivariate analyses that are used to reveal patterns in large datasets, but do not explain why those patterns exist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does a cluster analysis do?

A

Minimizes w/in group variation, maximizes between group variation (reduces the dimensionality of the dataset to a few groups of objects)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Under what circumstances might a cluster analysis be useful?

A

When distinct discontinuities are expected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the two steps for a cluster analysis?

A
  1. use a relevant association coefficienct to caluclate a dissimilarity/similarity matrix between or among objects/variables.
  2. represent association matrix as a tree (heirarchical clustering) or as groups of objects (k-means clustering)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What types of linkage rules are generally used to form martrices for heirarchical clustering? (3)

A
  1. nearest neighbor- distance between two clusters is equal to the distance between their CLOSEST neighboring points
  2. further neighbor: distance between 2 clusters is equal to the distance between their two furthest objects
  3. UPGMA: distance between 2 clusters is equal to the avg. distance between all inter-cluster pairs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does k-means clustering work?

A

objects are clustered into k (defined in advance) number of clusters based on their nearest Euclidan distance to the mean of clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is one advantage and one disadvantage of using k-means clustering?

A

advantage: don’t need a similiarity matrix
disadvantage: sensitive to outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a PCA?

A

Priciple componet analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a PCA do?

A

Calculates new synthetic variables (principle components) using linear combinations of the original variables to account for as much variability as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What kind of matrix is used for PCA when all data points have the same units (ex: species abundance)?

A

Variance-covariance matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What kind of matrix is used for PCA when data points have the different units?

A

correlation matrix, variables must be standardized so that distances are independent of original scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the dots on a PCA ordination?

A

Objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the vectors on a PCA ordination and what do they mean?

A

Variables

Vector direction indicates greatest change, rate may indicate rate of change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Under what conditions should a PCA be used?

A

Good when looking at linear responses across short gradients (otherwise CA, NMDS are better)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is an eigenvalue?

A

value denoting how much variance is explained by a given principle component.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When is an eigenvalue considered significant?

A

If its value is greater than the average of all eigenvalues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why are correlations between principle components and original variables not statistically valid in terms of describing which variables contribute most to variation observed in a PCA ordination?

A

components and variables are already linearly correlated and are not independent of one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does PCoA stand for?

A

Principle coordinate analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How is PCoA different from PCA?

A

Works with any dissimiliarity measure– can pick the association coefficient that works best for your data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Why are componets more difficult to interpret for PCoA than for PCA?

A

There is no direct link between componets because PCoA components are complex functions of variables depending on the matrix coefficient used to form the matrix; can still correlate variables with axes (but not stat. sig)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What does NMDS stand for?

A

non-parametric multidimensional scaling

29
Q

What analysis is good for identifying underlying gradients and representing relationships based on various distance measures?

A

NMDS

30
Q

How does NMDS work?

A

Ranks distances between objects (matrix), uses those ranks to map them non-linearly in ordination so as to preserve their ranks with the least amount of “stress”; proximity between objects corresponds to their similiarity.

31
Q

How does NMDS calculate stress?

A

Goes through several iterations of ordinations to indentify the lowest stress based on comparisons to original distances between samples

32
Q

How do you interpret stress values for NMDS?

A

> 0.3 stress is poor representation of the data

33
Q

Which two statistical tests are usually used for multivariate data?

A

NPMANOVA, ANOSIM

34
Q

What is NPMANOVA used for?

A

used to test for significance between at least two sets of multivariate, quantitative data

35
Q

What is the null hypothesis for NPMANOVA for multivariate data and how is it tested?

A

null hypothesis of equality between datasets is tested using Wilke’s lambda; can then use another post-hoc test to assess the significance of pairwise comparisons.

36
Q

What is ANOSIM used for?

A

Can test for significance based on any distance measure, compares ranks of distance in and between groups

37
Q

What statistic do you get after running an ANOSIM and how is it interpreted?

A

R statistic, measures whether separation is found (R=1) or no separation occurs (R=0)

R>0.75 considered well-separated statistically
R> 0.5 considered separated but overlapping
R

38
Q

What “indirect” gradient analyses can be used to assess environmental gradients for PCA and CA?

A

ANOVA

39
Q

What “indirect” gradient analyses can be used to assess environmental gradients for PCoA and NMDS?

A

Spearman Rank Correlations

40
Q

What are some other “indirect” gradient analyses? (2)

A

Run linear regression of variables onto existing ordination (done in R)

Can also use site symbols where size is proportional to the environmental value (good for NMDS)

41
Q

What is a constrained (canonical) ordination?

A

A direct gradient analysis where only the variation that can be explained by the environmental variables (provided in a seperate table) is displayed in the ordination

42
Q

In Constrained (canonical) ordination, species abundance is usually considered a ______.

A

response

43
Q

Constrained (canonical ordination) is usually based on _______ ______ _______ that related axes to environmental variables

A

multivariate linear models

44
Q

Redudundancy analysis (RDA) is another _________ analysis for environmental gradients.

A

Direct

45
Q

RDA is an extension of _____, where components are constrained to be linear combinations of environmental variables

A

PCA

46
Q

How does RDA “explain” variation between independent and dependent variables?

A

uses multiple linear regression, get correlation coefficients between each species and each environmental variable

47
Q

______ is similar to RDA, but uses unimodal species-environment relationships.

A

Canonical correspondence analysis

48
Q

How is a Mantel test used?

A

To compare to matrices; calculates correlation coefficients between corresponding matrix positions.

49
Q

What are two common diversity indices?

A

Shannon and Simplot

50
Q

What types of diversity can be calculated based on Simplot diversity? (3)

A

alpha: within sample diversity
beta: between sample diversity
gamma: landscape scale diversity

51
Q

What is a key difference between CA and PCA?

A

CA is used for categorical rather than continuous data

52
Q

For CA, all data should be on the ____ scale and ________.

A

same; non-negative

53
Q

CA decomposed the chi squared statistica associated with a table into __________ __________.

A

Orthogonal factors

54
Q

What is Euclidean distance?

A

Derived from the Pythagorean theorem, is just the “ordinary” distance between two points in 3D Euclidean space

55
Q

Bray-Curtis dissimilarity is used to:

A

quantify the compositional dissimilarity between sites based on counts at each site.

56
Q

ANOVA stands for:

A

Analysis of variance

57
Q

What are three assumptions of an ANOVA?

A

normality, independence, homogeneity of variance

58
Q

What is an ANOVA used for?

A

to analyze differences amoung group means and associated procedures (including variance)

59
Q

What is a MANOVA?

A

A mulivariate ANOVA (ANOVA with several dependent variables)

60
Q

What dissimiliarity metric is chi squared distance based on?

A

Euclidean distance

61
Q

What statistic is used by MANOVA?

A

Wilk’s lambda, a multivariate generalization of the F-distribution used in univariate analyses

62
Q

What statistic is used by ANOVA?

A

F distribution, describes the distribution of the test statistic when the null hypothesis is false

63
Q

What is Euclidean distance?

A

Distance between samples in 3D space, also called Pythagorean distance

64
Q

What is Bray-Curtis dissimilarity? How can the Bray-Curtis similarity be calculated?

A

A common ecological metric for determining the dissimilarity between two sites based on counts at both sites; Bray-Curtis similarity can be calculated by subtracting the Bray-Curtis dissimilarity from 100.

65
Q

When should a Kruskal-Wallis test be performed/used?

A

When you have one nominal variable and one ranked or scaled variable; is non-parametric so can be used in the place of a one-way ANOVA when data is not normally distributed

66
Q

What type of data can a Mann-Whitney U test be performed on?

A

Two sets of independent non-parametric data (not normally distributed, variance is unequal)

67
Q

What does the Mann-Whitney U test test?

A

Whether two independent groups of samples come from the same distribution; non-parametric version of the t-test, based on rank abundances

68
Q

What is the null hypothesis of a Kruskal-Wallis test?

A

That the mean (or mean rank) of the two groups is the same