Who are the contaminants in your sequencing project?

Well, been having many discussions recently about PCR amplification happening from “negative” controls where no sample DNA was added. Such amplification is alas pretty common – due to contamination occurring in some other material added to the PCR reaction.  Obviously it would be best to eliminate all DNA contamination of all reagents and all PCRs.  But if that does not happen, it is possible to try to detect contamination after it has happened.  Below I post some papers related to post-sequencing detection of contamination:

Any other suggestions or comments would be welcome.

UPDATE 10:30 AM 7/25 –

Was reminded on Twitter of a new, critically relevant publication on this issue: Reagent contamination can critically impact sequence-based microbiome analyses

10 thoughts on “Who are the contaminants in your sequencing project?

    1. Do you have a database one can include as part of a QIIME workflow that would include sequences from known reagent contaminants? Then you could run Sourcetracker and see if samples seem to have reagent contamination.

      1. We are currently discussing publishing a standard database that could be used for this, but in the meantime I typically use the data from my PNAS 2011 paper, which contains human gut, human skin, human mouth, plus soil and other environmental samples. These give you a good range of potential contaminating environments, and were sequenced with the widely used 515F/806R primers on Illumina.

        1. Greg – Do you exclude sequences that “look like” contamination or re-process the sample? If you exclude the sequences, how are you sure the sequences are contamination and isn’t this (somewhat) assuming the answer to your sequencing effort?
          I am interested in how other groups handle this. We currently run negative PCRs for every reaction and a negative for every DNA extraction batch. If we could do this well post-sequencing, it would save a lot of effort.

          1. I’m also a bit confused as to how you can use a database to screen out contamination. If you’re looking in a new environment, particularly a human-associated one… how do you decide which taxa don’t actually belong in those samples? It seems like you have to have actual wet-lab controls as Kyle described… though it’s not clear what the best way to deal with those is either.

  1. I am in need of advice on how to “correct for contamination”. We are currently including non-template controls during our extraction process as well as our library prep process. My question now is, how do you correct for the “contamination signal”:
    1. Do you remove the total number of reads present in non-template controls for specific taxa
    from all your samples in the run? Or do you calculate an average number of reads sequenced
    for extraction non-template controls and for library prep non-template controls and remove
    these number of reads for the respective taxa from all samples in the run?
    2. Would you correct for contamination at the read or OTU level?

  2. I’d be very hesitant to rely strictly on any ‘bioinformatic’ solution to removing contaminants. If the data are not trustworthy, it would make me nervous to remove contaminants using SourceTracker (or something equivalent). The key to avoiding problems with contamination is to do good lab work on the front end – otherwise it is garbage in, garbage out.

    For example – this makes me very nervous: http://americangut.org/?page_id=277 as the assumption is that there is only a handful of bacteria associated with ‘blooms’ in samples stored improperly and the abundances of other taxa will not be unduly affected.

Leave a Reply

Jonathan Eisen

I am an evolutionary biologist and a Professor at U. C. Davis. My lab is in the UC Davis Genome Center and I hold appointments in the Department of Medical Microbiology and Immunology in the School of Medicine and the Department of Evolution and Ecology in the College of Biological Sciences. My research focuses on the origin of novelty (how new processes and functions originate). To study this I focus on sequencing and analyzing genomes of organisms, especially microbes and using phylogenomic analysis (see my lab site here which has more information on lab activities).  In addition to research, I am heavily involved in the Open Access publishing and Open Science movements.