Compositional Data Flashcards
Compositional Data
- multivariate sets of non-negative components
- measured directly as proportions that sum to one or measured in absolute terms with different totals
- interest is in the size of components relative to the total and relative to each other
Historic Compositional Data
- non-negative vectors that are subject to a uni-sum constraint
- proportions that sum to 1
- sample space - simplex
Spurious Correlation
- early work of compositional data based upon
- arises due to sum constraint
- increase in one component realtive to the total reduces the share of the other components, induces a negtaive correlation in the relative values
Spurious Correlation - outside simplex
when outside simplex this is not the case
number of Ford crashes is not negatively correlated with the number of VW crashes. however, when consider the proportions - a negative correlation can emerge. An increase in the proportion of Ford crashes relative
to the total would automatically reduce the proportion of Volkswagen crashes, even if the
actual number of Volkswagen crashes remains unchanged.
Scale Invariance
- ratios between components unchanged under rescaling
multiplying by constant - ratios stay the same
Subcompositional Coherence
- relantionships between parts remain valid even when analysing a subset of components
A,B,C - analysing just A,B should not give conflicting conclusions
Subcompositional Dominance
- if one component dominates the full composition, it should dominate in any subcomposition
if A always greater than B in A,B,C - then should remain true in A,B
Permutation Invariance
- order of the components should not affect the analysis
A,B,C should give the same results as B,A,C
Ternary Diagram
- graphical visualtion of three components
- near vertex - high concentration of that component
- near centre - equal proportions of all components
ALR
additive log-ratio
* takes log of components to one reference component
* dependent on choice of divisor
* asymmetric
issue if there is not one non-zero component
CLR
centered log-ratio
* takes log of components to the geometric mean
* covariance singular as all components retained - determinant equal to 0
robustness issue during to singularity - some tehchniques not suitable (discriminant analysis)
ILR
isometric log-ratio
* uses orthonormal coordinates to transform components
* creates independent, orthogonal coordinates
harder to interpret and complex to construct
Rounded Zeros
- represent values that fall below some detection limit
- not true zero values
- due to measurement error or below detection limit
- treated by replacing the zero values
Structural Zeros
- true zeros
- actual zero or absence of component
- informative
- carry important information
- model-based approaches
Restrictive Definition of Compositional Data
compositional data in broader context
- confining compositional data to simpelx too restrictive
- assumes compositions must be analysed through relative terms rather than absolute values
- compositional data can often orginate from counts / absolute values
- absolute values can carry important inforamtion as they can influence the variance and overall dynamics of the data