Workshop to produce curated 18S rDNA reference database

Just got this announcement by email and I thought it might be of interest:

Join the curation effort & enroll in our first workshop!

We would like you to join us this summer in our first workshop that will take place in Vancouver, Canada from July 19 to 25. This effort brings together taxonomists with expertise in individual lineages that span the eukaryotic tree of life to curate reference 18S rDNA sequences of these lineages by incorporating knowledge of phylogenetic, morphological, and/or environmental contextual data. During the workshop, the working group will integrate the curation efforts on individual eukaryotic groups into a biological data warehouse consisting of curated sequences, flexible taxonomy, and phylogenetic trees and their underlying sequence alignments. We will further use this 18S rDNA reference database to investigate the environmental distribution of eukaryotic microbes from large-scale HTES datasets. Each curator is encouraged to use their curated data to address research questions of interest. We will discuss how to tackle such projects and encourage participants to publish their results. If you are a PhD student or a Postdoc we encourage you to apply before May the 8th.

Application deadline: May 8th 2015

Apply here!




Phylogenetically informed curation of Eukaryotic 18S rDNA

The diversity of eukaryotes extends far beyond the familiar plants, animals, and fungi. In fact, the vast majority of eukaryotic lineages are microbial. Eukaryotic microbes (protists) are important players in ecological processes and also directly influence the biology and health of animals and plants as parasites, commensals and symbionts. However, the extent of their diversity is still largely unknown because most eukaryotes have not yet been or cannot be cultured.

High-throughput environmental sequencing (HTES) has greatly expanded our understanding of microbial biodiversity and its ecological role. HTES enables characterization of microbial communities rapidly and from hundreds of samples at the same time. The depth of sampling has also revealed novel diversity in all ecosystems examined to date. However, the value of HTES data for cataloging the extent and distribution of protistan biodiversity is critically dependent on the quality of the reference databases used to annotate these sequences.

There is a growing and urgent need for well-curated reference databases to annotate the flood of environmental sequences coming in. The lack of a successful method to clean up mislabeled sequences makes manual curation by experts a necessary task. Ribosomal DNA is the marker most frequently used to characterize diversity because it is universally present and has been sequenced for the most comprehensive array of known taxa (microscopically identified and/or cultured organisms). Curated eukaryotic databases of ribosomal DNA have greatly improved analysis capacity in recent years. However, they struggle to keep pace with rapidly changing views on eukaryotic taxonomy, the influx of new data, and computational challenges related to assembling high quality alignments and trees that are necessary for accurate characterization of lineage diversity. As new environmental sequence data continues to reveal novel lineages these data should ideally inform refinements in taxonomy and be incorporated into reference databases. This rarely happens in practice because 1) the communities building taxonomic frameworks for eukaryotes are distinct from those conducting environmental sequence analysis, and 2) curating the vast amounts of existing data in a phylogenetic framework is beyond the scope of individual research groups. Investment now in a curated reference database with high quality alignments and phylogenetic trees will pay dividends now through the diversity research it will enable and in the future because it will facilitate easier and more reliable maintenance and automatic growth that can keep pace with developments in the field.

Generating a phylogenetically and taxonomically informed reference databases in and of itself leads to novel insights into microbial diversity and ecology, in addition to producing a community resource that adds value to further investigations. Many sequences across the eukaryotic tree of life are poorly or misannotated and have not been assembled into a comprehensive and coherent phylogenetic framework. As a result, the curation process can uncover additional novel lineages, refine understanding of the relationships among environmental clades and previously described lineages, and offer new glimpses into the diversity contained within these clades.

Leave a Reply

Jonathan Eisen

I am an evolutionary biologist and a Professor at U. C. Davis. My lab is in the UC Davis Genome Center and I hold appointments in the Department of Medical Microbiology and Immunology in the School of Medicine and the Department of Evolution and Ecology in the College of Biological Sciences. My research focuses on the origin of novelty (how new processes and functions originate). To study this I focus on sequencing and analyzing genomes of organisms, especially microbes and using phylogenomic analysis (see my lab site here which has more information on lab activities).  In addition to research, I am heavily involved in the Open Access publishing and Open Science movements.