microBEnet journal club: MetaBAT for reconstructing single genomes from complex microbial communities



There is an interesting paper out a few days ago in PeerJ: MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communitiesBy Dongwan D. KangJeff FroulaRob EganZhong Wang​.  The key to what they do in the paper is summarized in Figure 1:


The legend is below:

There are three preprocessing steps before MetaBAT is applied:

  1. A typical metagenome experiment may contain many spatial or time-series samples, each consisting of many different genomes (different color circles).
  2. Each sample is sequenced by next-generation sequencing technology to form a sequencing library with many short reads.
  3. The libraries may be combined before de novo assembly. After assembly, the reads from each sample must be aligned in separate BAM files. MetaBAT then automatically performs the remaining steps:
  4.  For each contig pair, a tetranucleotide frequency distance probability (TDP) is calculated from a distribution modelled from 1,414 reference genomes.
  5. For each contig pair, an abundance distance probability (ADP) across all the samples is calculated.
  6. The TDP and ADP of each contig pair are then combined, and the resulting distance for all pairs form a distance matrix.
  7. Each bin will be formed iteratively and exhaustively from the distance matrix.

So – basically what MetaBAT does is to carry out post-assembly analysis of metagenomic data sets and then bins the contigs from the assemblies using a variety of pieces of information about the contigs.  Not 100% sure how useful this is / will be but seems worth trying out for those trying to assemble / bin metagenomic data.

Leave a Reply

Jonathan Eisen

I am an evolutionary biologist and a Professor at U. C. Davis. My lab is in the UC Davis Genome Center and I hold appointments in the Department of Medical Microbiology and Immunology in the School of Medicine and the Department of Evolution and Ecology in the College of Biological Sciences. My research focuses on the origin of novelty (how new processes and functions originate). To study this I focus on sequencing and analyzing genomes of organisms, especially microbes and using phylogenomic analysis (see my lab site here which has more information on lab activities).  In addition to research, I am heavily involved in the Open Access publishing and Open Science movements.