Estimating average genome size from metagenomes

We recently submitted a paper to biorxiv on the estimation of average genome size from shotgun metagenomics data and it’s application to the human microbiome ( It is currently undergoing peer review.

This study was motivated by the troubling observation that many universal single-copy genes appear to vary significantly across metagenomes. How can this be? It turns out that differences in genome size between microbial communities can lead to this strange pattern. Specifically, these single-copy genes will appear to be less abundant in the communities with the larger genomes despite that these genes are actually present at the same copy-number in all communities.

The challenge is how to correct for this bias. We all know that microbial genomes come in many sizes, but how can one figure out the average size of genomes in a sample from just the shotgun data? While several methods attempt to address this question, we found that they were not sufficiently accurate or fast enough. To address this problem, we developed MicrobeCensus ( which can rapidly and accurately estimate average genome size (AGS) from metagenomics data.

Applying our tool to over 1,300 shotgun metagenomes from the human microbiome, we found that AGS varies significantly both within and between body sites. For example, in the gut AGS ranges from 2.5 to 5.8 megabases. After correcting for this bias, we found that AGS is positively correlated with the abundance of Bacteroides and the copy-number of genes related to many metabolic pathways; in contrast, communities with small AGS had greater abundance of Firmicutes and copy-number of genes related to membrane transport. This finding highlights different adaptive strategies of Bacteria in the gut and would have been missed without proper normalization.

Estimation and correction for AGS is not limited to studies of the human microbiome, and should improve detection of differentially abundant genes from other metagenomics projects.

Leave a Reply

Stephen Nayfach

Stephen Nayfach is a bioinformatics graduate student in Katherine Pollard's lab at the University of California San Fransisco.