Statistical analyses Flashcards

1
Q

Under what circumstances are data transformations important for multivariate analyses?

A

If data do not have a uniform scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you deal with qualitative variables in multivariate analyses

A

Give them a numerical value

example: seasonality, use four seperate variables, designate absence or presence of that season as a 0 or 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does standardization do? (2)

A

used to remove influences of magnitude difference

results in dimensionless variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a z-score and how is it calculated? (2 steps)

A

Used to standardize data;

  1. take difference between the value and mean of the variable
  2. divide by the stdev of the variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between an object and a variable?

A

Object: categories of data (samples, sites, time periods, etc.)

Variable: measured value for each object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does normalization do?

A

Corrects distribution shapes of variables that depart from normality, tries to obtain homogenous variances for variables for better multivariate analyses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What transformation can be done on data with a lot of zeroes?

A

Hellinger transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are exploratory multivariate analyses?

A

multivariate analyses that are used to reveal patterns in large datasets, but do not explain why those patterns exist

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does a cluster analysis do?

A

Minimizes w/in group variation, maximizes between group variation (reduces the dimensionality of the dataset to a few groups of objects)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Under what circumstances might a cluster analysis be useful?

A

When distinct discontinuities are expected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the two steps for a cluster analysis?

A
  1. use a relevant association coefficienct to caluclate a dissimilarity/similarity matrix between or among objects/variables.
  2. represent association matrix as a tree (heirarchical clustering) or as groups of objects (k-means clustering)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What types of linkage rules are generally used to form martrices for heirarchical clustering? (3)

A
  1. nearest neighbor- distance between two clusters is equal to the distance between their CLOSEST neighboring points
  2. further neighbor: distance between 2 clusters is equal to the distance between their two furthest objects
  3. UPGMA: distance between 2 clusters is equal to the avg. distance between all inter-cluster pairs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does k-means clustering work?

A

objects are clustered into k (defined in advance) number of clusters based on their nearest Euclidan distance to the mean of clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is one advantage and one disadvantage of using k-means clustering?

A

advantage: don’t need a similiarity matrix
disadvantage: sensitive to outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a PCA?

A

Priciple componet analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a PCA do?

A

Calculates new synthetic variables (principle components) using linear combinations of the original variables to account for as much variability as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What kind of matrix is used for PCA when all data points have the same units (ex: species abundance)?

A

Variance-covariance matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What kind of matrix is used for PCA when data points have the different units?

A

correlation matrix, variables must be standardized so that distances are independent of original scales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the dots on a PCA ordination?

A

Objects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the vectors on a PCA ordination and what do they mean?

A

Variables

Vector direction indicates greatest change, rate may indicate rate of change

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Under what conditions should a PCA be used?

A

Good when looking at linear responses across short gradients (otherwise CA, NMDS are better)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is an eigenvalue?

A

value denoting how much variance is explained by a given principle component.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When is an eigenvalue considered significant?

A

If its value is greater than the average of all eigenvalues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why are correlations between principle components and original variables not statistically valid in terms of describing which variables contribute most to variation observed in a PCA ordination?

A

components and variables are already linearly correlated and are not independent of one another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What does PCoA stand for?
Principle coordinate analysis
26
How is PCoA different from PCA?
Works with any dissimiliarity measure-- can pick the association coefficient that works best for your data
27
Why are componets more difficult to interpret for PCoA than for PCA?
There is no direct link between componets because PCoA components are complex functions of variables depending on the matrix coefficient used to form the matrix; can still correlate variables with axes (but not stat. sig)
28
What does NMDS stand for?
non-parametric multidimensional scaling
29
What analysis is good for identifying underlying gradients and representing relationships based on various distance measures?
NMDS
30
How does NMDS work?
Ranks distances between objects (matrix), uses those ranks to map them non-linearly in ordination so as to preserve their ranks with the least amount of "stress"; proximity between objects corresponds to their similiarity.
31
How does NMDS calculate stress?
Goes through several iterations of ordinations to indentify the lowest stress based on comparisons to original distances between samples
32
How do you interpret stress values for NMDS?
>0.3 stress is poor representation of the data
33
Which two statistical tests are usually used for multivariate data?
NPMANOVA, ANOSIM
34
What is NPMANOVA used for?
used to test for significance between at least two sets of multivariate, quantitative data
35
What is the null hypothesis for NPMANOVA for multivariate data and how is it tested?
null hypothesis of equality between datasets is tested using Wilke's lambda; can then use another post-hoc test to assess the significance of pairwise comparisons.
36
What is ANOSIM used for?
Can test for significance based on any distance measure, compares ranks of distance in and between groups
37
What statistic do you get after running an ANOSIM and how is it interpreted?
R statistic, measures whether separation is found (R=1) or no separation occurs (R=0) R>0.75 considered well-separated statistically R> 0.5 considered separated but overlapping R
38
What "indirect" gradient analyses can be used to assess environmental gradients for PCA and CA?
ANOVA
39
What "indirect" gradient analyses can be used to assess environmental gradients for PCoA and NMDS?
Spearman Rank Correlations
40
What are some other "indirect" gradient analyses? (2)
Run linear regression of variables onto existing ordination (done in R) Can also use site symbols where size is proportional to the environmental value (good for NMDS)
41
What is a constrained (canonical) ordination?
A direct gradient analysis where only the variation that can be explained by the environmental variables (provided in a seperate table) is displayed in the ordination
42
In Constrained (canonical) ordination, species abundance is usually considered a ______.
response
43
Constrained (canonical ordination) is usually based on _______ ______ _______ that related axes to environmental variables
multivariate linear models
44
Redudundancy analysis (RDA) is another _________ analysis for environmental gradients.
Direct
45
RDA is an extension of _____, where components are constrained to be linear combinations of environmental variables
PCA
46
How does RDA "explain" variation between independent and dependent variables?
uses multiple linear regression, get correlation coefficients between each species and each environmental variable
47
______ is similar to RDA, but uses unimodal species-environment relationships.
Canonical correspondence analysis
48
How is a Mantel test used?
To compare to matrices; calculates correlation coefficients between corresponding matrix positions.
49
What are two common diversity indices?
Shannon and Simplot
50
What types of diversity can be calculated based on Simplot diversity? (3)
alpha: within sample diversity beta: between sample diversity gamma: landscape scale diversity
51
What is a key difference between CA and PCA?
CA is used for categorical rather than continuous data
52
For CA, all data should be on the ____ scale and ________.
same; non-negative
53
CA decomposed the chi squared statistica associated with a table into __________ __________.
Orthogonal factors
54
What is Euclidean distance?
Derived from the Pythagorean theorem, is just the "ordinary" distance between two points in 3D Euclidean space
55
Bray-Curtis dissimilarity is used to:
quantify the compositional dissimilarity between sites based on counts at each site.
56
ANOVA stands for:
Analysis of variance
57
What are three assumptions of an ANOVA?
normality, independence, homogeneity of variance
58
What is an ANOVA used for?
to analyze differences amoung group means and associated procedures (including variance)
59
What is a MANOVA?
A mulivariate ANOVA (ANOVA with several dependent variables)
60
What dissimiliarity metric is chi squared distance based on?
Euclidean distance
61
What statistic is used by MANOVA?
Wilk's lambda, a multivariate generalization of the F-distribution used in univariate analyses
62
What statistic is used by ANOVA?
F distribution, describes the distribution of the test statistic when the null hypothesis is false
63
What is Euclidean distance?
Distance between samples in 3D space, also called Pythagorean distance
64
What is Bray-Curtis dissimilarity? How can the Bray-Curtis similarity be calculated?
A common ecological metric for determining the dissimilarity between two sites based on counts at both sites; Bray-Curtis similarity can be calculated by subtracting the Bray-Curtis dissimilarity from 100.
65
When should a Kruskal-Wallis test be performed/used?
When you have one nominal variable and one ranked or scaled variable; is non-parametric so can be used in the place of a one-way ANOVA when data is not normally distributed
66
What type of data can a Mann-Whitney U test be performed on?
Two sets of independent non-parametric data (not normally distributed, variance is unequal)
67
What does the Mann-Whitney U test test?
Whether two independent groups of samples come from the same distribution; non-parametric version of the t-test, based on rank abundances
68
What is the null hypothesis of a Kruskal-Wallis test?
That the mean (or mean rank) of the two groups is the same