Swabs to Genomes: Week 5 (Phylogenetic trees and taxonomy)

(This blog post was written by group #1 as a writing assignment.  See their first post here.  Note that much of what the students did in this class is taken directly from the Swabs to Genomes paper which contains relevant links and references)

This week in class, our goal was to create a phylogenetic tree using our consensus sequence obtained from last week’s class. In order to do this, we needed sequences of bacterial species closely related to our bacteria of interest. Using the Ribosomal Database Project (RDP), we were able to upload our sequences online in order to find species of bacteria with sequences similar to ours. After selecting species of similarity and picking an outgroup for our phylogenetic tree, we downloaded our FASTA file from RDP, and through the power and magic of computer programming, cleaned up the FASTA file. Finally, with our new and improved FASTA file, we used a program called FastTree to create a phylogenetic tree. An interesting aspect in this week’s class was learning that phylogenetic trees are more than just about tracing back through history, as they are also used in analyzing DNA.

Generally, Phylogenetic trees are useful because we can make inferences about the characteristics of a species based on its position on a phylogenetic tree. Before DNA sequencing existed, phylogenetic trees were made by looking at the characteristics of species and making inferences on their positions on a tree. This method is particularly difficult with bacteria, one reason being that bacteria participate in lateral gene transfer, sharing characteristics between species, thus obfuscating results. DNA sequencing allows for better categorization of phylogenetic relationships because we can observe and quantify the differences between species based on how many nucleotides have changed.

The gene that we are using to delineate one species of bacteria from another is the 16S ribosomal gene. This is an ideal gene because it is highly conserved between generations and not typically affected by lateral gene transfer. This is the gene used in RDP. We wanted to create a phylogenic tree of species most closely related to the ones which we had sampled. We loaded our sample into RDP and were returned a list of species very similar to our own. We were able to search this list for the species that matched our top three candidates from our BLAST results. We created a shopping cart of sorts (RDP allows the user to assemble a set of ribosomal DNA and this set is stored in a “shopping cart”, the contents of which can then be downloaded) of these species, our own species, and one species from a different genus to serve as our outgroup. RDP website supplied a FASTA file containing all of these species which we would eventually use to make our phylogenetic tree.

To build the phylogenetic tree, we used a program called FastTree. Through the power and magic of a pre-made perl script, we cleaned up our FASTA file so that it could be read by FastTree. We used our outgroup to root the tree. This tells the software which species is the most distant and allows for the rest to fall into a branching pattern starting at that outgroup. Without the outgroup, branching appears random instead of chronological. Using the generated phylogeny, we were able to see which species was most closely related to ours. For many students, this resulted with their species falling into a clade with a single species, thus letting them know which species they were working with. Other students had ambiguous results, where a single clade could not be created with just one species.

Our species are on their way to have their entire genomes sequenced. For each species, we will be assembling millions of pieces of DNA together, called the library, in order to build the genome.


We are curious to know from the scientific community: if you were going to sequence an entire genome, what markers would be interesting to find in bacteria cultured from Abalone feces? What markers could be useful in the understanding of Abalone Withering Syndrome?

2 thoughts on “Swabs to Genomes: Week 5 (Phylogenetic trees and taxonomy)

  1. All

    Glad to see this is progressing. Can you provide some more details that would help in answering some of the questions?

    Apologies if I missed this in the posts but I did not see answers to these questions:

    Can you say something more about what Abalone Withering Syndrome is? (I sort of know, but for other readers it might help to have more detail). Maybe some links to articles about it or a summary of what it is would be really helpful.

    What kind of organism causes the withering disease?

    Are you trying to culture the organism that causes the disease or just any organisms from the abalone feces?

    Knowing more about the disease would help provide some guidance as to what to look for in the genome.

    1. Abalone Withering Syndrome is a bacterium that infects the digestive epithelia and effects the digestive gland. Leading to anorexia, absorption of pedal musculature, lethargy, and eventually death. However, there seems to be specific species immune or less effected by the disease.

      See Review paper for more information on Abalone Withering Syndrome:


      In regards to what we are attempting to culture, we are not attempting to culture the disease, but are hoping to find a significant difference in bacteria cultured between healthy and diseased Abalone (with Withering syndrome) .We are wondering how to go about analyzing the genetic data to attempt and find some correlation between bacteria cultured and withering syndrome.

Leave a Reply

David Coil

David Coil is a Project Scientist in the lab of Jonathan Eisen at UC Davis. David works at the intersection between research, education, and outreach in the areas of the microbiology of the built environment, microbial ecology, and bacterial genomics. Twitter