This seems like it is a must read for anyone working on microbiomes: Frontiers | Microbiome Datasets Are Compositional: And This Is Not Optional | Microbiology
Gloor GB, Macklaim JM, Pawlowsky-Glahn V and Egozcue JJ (2017) Microbiome Datasets Are Compositional: And This Is Not Optional. Front. Microbiol. 8:2224. doi: 10.3389/fmicb.2017.02224
Summary from the paper:
Datasets collected by high-throughput sequencing (HTS) of 16S rRNA gene amplimers, metagenomes or metatranscriptomes are commonplace and being used to study human disease states, ecological differences between sites, and the built environment. There is increasing awareness that microbiome datasets generated by HTS are compositional because they have an arbitrary total imposed by the instrument. However, many investigators are either unaware of this or assume specific properties of the compositional data. The purpose of this review is to alert investigators to the dangers inherent in ignoring the compositional nature of the data, and point out that HTS datasets derived from microbiome studies can and should be treated as compositions at all stages of analysis. We briefly introduce compositional data, illustrate the pathologies that occur when compositional data are analyzed inappropriately, and finally give guidance and point to resources and examples for the analysis of microbiome datasets using compositional data analysis.
Of course, many people are already aware of this and others probably are partially aware of it. But it is a good reminder for such people and very important for those who had not thought about this. And the paper has some useful discussion of the challenges with analyzing data keeping the compositional aspect in mind.
UPDATE 11/16 – some comments from Twitter may be of interest
Compositional data is particularly problematic for correlation!
We use SparCC from Alm lab (& wrote a speed-up -> https://t.co/Gso2BXsYeV)
— Kat Holt (@DrKatHolt) November 15, 2017
Hi Jonathan – Check out https://t.co/5iMWSZU2uG on the importance of compositionality in gut microbiome data
— Jeroen Raes (@jeroenraes) November 16, 2017
Sweet! Absolute abundances were a matter of time, and a real treat to have.
Ratios and contrasts are still useful tools (e.g. PCA loadings simplified as contrasts), but opening the door to standard multivariate analyses will improve the reliability of microbiome data analysis. https://t.co/3VzRz9qb5g
— Alex Washburne (@alex_washburne) November 16, 2017