(This blog post was written by group #1 as a writing assignment. See their first post here. Note that much of what the students did in this class is taken directly from the Swabs to Genomes paper which contains relevant links and references)
This week in class, our goal was to create a phylogenetic tree using our consensus sequence obtained from last week’s class. In order to do this, we needed sequences of bacterial species closely related to our bacteria of interest. Using the Ribosomal Database Project (RDP), we were able to upload our sequences online in order to find species of bacteria with sequences similar to ours. After selecting species of similarity and picking an outgroup for our phylogenetic tree, we downloaded our FASTA file from RDP, and through the power and magic of computer programming, cleaned up the FASTA file. Finally, with our new and improved FASTA file, we used a program called FastTree to create a phylogenetic tree. An interesting aspect in this week’s class was learning that phylogenetic trees are more than just about tracing back through history, as they are also used in analyzing DNA.
Generally, Phylogenetic trees are useful because we can make inferences about the characteristics of a species based on its position on a phylogenetic tree. Before DNA sequencing existed, phylogenetic trees were made by looking at the characteristics of species and making inferences on their positions on a tree. This method is particularly difficult with bacteria, one reason being that bacteria participate in lateral gene transfer, sharing characteristics between species, thus obfuscating results. DNA sequencing allows for better categorization of phylogenetic relationships because we can observe and quantify the differences between species based on how many nucleotides have changed.
The gene that we are using to delineate one species of bacteria from another is the 16S ribosomal gene. This is an ideal gene because it is highly conserved between generations and not typically affected by lateral gene transfer. This is the gene used in RDP. We wanted to create a phylogenic tree of species most closely related to the ones which we had sampled. We loaded our sample into RDP and were returned a list of species very similar to our own. We were able to search this list for the species that matched our top three candidates from our BLAST results. We created a shopping cart of sorts (RDP allows the user to assemble a set of ribosomal DNA and this set is stored in a “shopping cart”, the contents of which can then be downloaded) of these species, our own species, and one species from a different genus to serve as our outgroup. RDP website supplied a FASTA file containing all of these species which we would eventually use to make our phylogenetic tree.
To build the phylogenetic tree, we used a program called FastTree. Through the power and magic of a pre-made perl script, we cleaned up our FASTA file so that it could be read by FastTree. We used our outgroup to root the tree. This tells the software which species is the most distant and allows for the rest to fall into a branching pattern starting at that outgroup. Without the outgroup, branching appears random instead of chronological. Using the generated phylogeny, we were able to see which species was most closely related to ours. For many students, this resulted with their species falling into a clade with a single species, thus letting them know which species they were working with. Other students had ambiguous results, where a single clade could not be created with just one species.
Our species are on their way to have their entire genomes sequenced. For each species, we will be assembling millions of pieces of DNA together, called the library, in order to build the genome.
We are curious to know from the scientific community: if you were going to sequence an entire genome, what markers would be interesting to find in bacteria cultured from Abalone feces? What markers could be useful in the understanding of Abalone Withering Syndrome?