Careful Consideration of Bioinformatic Pipeline Choices Based on Study Goals

Sequencing of PCR-amplified marker regions (e.g. 16S, ITS) for characterization of sample microbial ecology is a widely-used tool in Microbiology of the Built Environmenta (MoBE) investigations. Due to the large amount of data produced by these methods, sequences are typically clustered into operational taxonomic units (OTUs) based on sequence similarity to simplify downstream processing. However, the impacts of sequence clustering and other bioinformatics pipeline decisions on study results are too often not carefully considered by investigators.

Naomichi Yamamoto (Seoul National University) and I recently published a manuscript entitled Clustering of fungal community internal transcribed spacer (ITS) sequence data obscures taxonomic diversity (doi: 10.1111/1462-2920.12390) investigating the impact of clustering fungal community ITS sequence data on taxonomic coverage (number of taxa observed). This work demonstrates a small, but statistically significant loss in taxonomic coverage when applying typically used clustering techniques, while also demonstrating that clustering does not statistically improve taxonomic assignment. Ambiguous sequences (i.e. sequences with divergent taxonomic assignment) were excluded, alleviating obvious issues with dual nomenclature or pleomorphs.

I am not advocating that sequence clustering prior to taxonomic assignment is inappropriate in all (or even most) study cases; however, investigators should be aware of the trade-offs associated with all decisions made in bioinformatics pipelines to ensure robust study conclusions. This work points to the necessity of carefully considering all choices made in bioinformatics pipelines and not accepting default settings if they conflict with our study goals. The MoBE program represents an interesting case, where many investigators are relatively inexperienced in bioinformatics, yet seek to apply bioinformatics tools to make study conclusions. Reiterating a consistent theme throughout the MoBE program, all investigators don’t need to be bioinformatics experts (or building science experts, or architectural experts…), but everybody needs to know enough to be dangerous.

Lab Blog:      Twitter: @Bibby_Lab

pipeline choices

2 thoughts on “Careful Consideration of Bioinformatic Pipeline Choices Based on Study Goals

Leave a Reply

Kyle Bibby

Kyle Bibby is an assistant professor at the University of Pittsburgh.