Teaching bioinformatics using IPython Notebooks

I’ve been teaching undergraduate and graduate bioinformatics for three years at Northern Arizona University. This semester I tried an experiment in my undergraduate course: I decided to ditch slides and instead use IPython Notebooks to present my course materials. This turned out to be hugely successful, and resulted in an early version of an online bioinformatics textbook called An Introduction to Applied Bioinformatics.

In the next few paragraphs I’ll describe the goals of my undergraduate bioinformatics course and how I’ve taught it in the past, and then the motivations for trying this experiment, how I did it, and why I judge it a success. I’ll wrap up by talking about what you’d need to try this out on your own.

BIO/CS 299: Introduction to Bioinformatics

Since joining the faculty of NAU in Fall, 2011, I’ve taught an Introduction to Bioinformatics course three times (course websites for Fall 2011, Spring 2013, and Spring 2014). This course has always been cross-listed in the Computer Science and Biology departments, meaning that I have sophomore- through senior-level undergraduates from both disciplines, most of whom have never been trained in the other discipline. That’s tough, but (as readers here are probably intimately familiar with) bioinformatics is interdisciplinary by necessity: whether you fall more on the biologist or the computer scientist end of the spectrum, you’re always communicating with someone who speaks a language that only partially overlaps with yours. I think it’s important to make this a part of bioinformatics education.

The focus of my course has been applied bioinformatics. We explore algorithms and theory, but in the context of real-world applications, such as source tracking of microbes in the built environment. In the past, my teaching materials have been a combination of slides and static web pages containing code (example).

Motivations for switching to IPython Notebooks

This year I wanted to try re-writing all (ok, most… there are only so many hours in the week) of my lecture materials as IPython Notebooks. I had a few motivations for this.

First, in my experiences developing material for and teaching QIIME workshops, I’ve found IPython Notebooks to be a very effective tool for education. If well-designed, students can learn a computational concept in the context of its implementation and application. In the context of bioinformatics, this allows students to explore things like how changing a sequence can affect the significance of a pairwise alignment, or modifying the order of inputs in an ordination calculation can affect the directionality of an axis in an ordination plot. Because the notebooks are executable (students can download them and run them locally) it’s easy for them to explore whatever interests them about the algorithms being presented. If the notebooks contain discussion of the ideas that are being presented, they also then have detailed study materials.

Next, I was inspired by some of the IPython Notebook-based books that have been popping up lately, for example Bayesian Methods for Hackers and many others. I thought that this would be an ideal format for a book on introductory bioinformatics, and that I could take some big strides toward creating a proof-of-concept of that book in a semester by adapting the materials that I’d developed over the past few years.

Finally, it seemed like fun. I’d get to work with tools that I love, like python, IPython, scikit-bio, matplotlib, GitHub, OmniGraffle, and a good ol’ pencil and paper, rather than tools that I loathe, like Powerpoint.

How it worked

I immediately put the framework for my materials on GitHub here. As I would develop materials, I’d commit the changes and push to GitHub. I could then share links to static versions of the notebooks via nbviewer.

In each class, I’d clear the output of the notebook that we were working with that day and running from an IPython Notebook server on my laptop, I’d present the material, executing and experimenting with code modifications as we went.

Students who wanted to follow along and execute the code themselves could either run the IPython Notebook locally (the most reliable), or use a class IPython Notebook server that we set up for this purpose (this was a little bumpier – we had some wifi connectivity issues, and at times the server would go down because too many notebooks were running at the same time). Students who didn’t want to run the notebooks could follow along with the static versions from nbviewer. In the future, I might invest one class period in getting everyone set up with a Linux virtual machine on their own laptop that they can run the IPython Notebook in.

Why you should care: interactive lecture materials are really effective for teaching bioinformatics

Because the IPython Notebooks are interactive, students can (and actually do!) work with the notebooks in class and at home to experiment with the code, which drives active learning of the concepts. For example, one student this semester told us about his experiments with Smith-Waterman gap open penalties of 1000, and about what happens when you make the gap penalties negative (so they effectively become gap rewards; hint: you get a lot of gaps, but if want to see for yourself, install the notebook and try it out).

This interactivity is also very convenient while teaching. For example, when a question would come up in class (e.g., “would Smith-Waterman alignment scale better if we just ran it on multiple processors?”), rather than only giving a theoretical description of why it would run faster but the complexity of the algorithm would stay the same, we could easily try it. I’d write a few lines of code in the notebook, and in addition to explaining the answer I could actually show it. In cases like this, I could integrate the question and answer into the notebook after class, which improves the book for future readers.

From my perspective and the perspective of my TA John Chase, who took the class the first time I taught it and has TA’ed every time I’ve taught it since, this year’s class was much better than previous years’ classes (and I’ve had consistently good anonymous student reviews of the course, so it was already pretty good). We felt that the students were much more engaged and interested while in class, and that their performance on quizzes and assignments suggested they understood the material better.

Over the course of the semester I converted the course materials that I was developing into an early version of an online bioinformatics textbook called An Introduction to Applied Bioinformatics which is free and open source. I plan to continue to develop this over the next couple of years, and will be expanding it to include more Applications chapters as well as more exercises that can be assigned as homework.

How to start teaching with IPython Notebooks

If you want to use An Introduction to Applied Bioinformatics in your own classes, it’s easy to get started. The install instructions are on the website, and as long as you and your students can run the IPython Notebook (on local hardware such as laptops, or on a server that your academic computing support department might be able to help you set up) you should be all set.

If you’re interested in developing your own materials as IPython Notebooks, that’s also fairly straightforward. The core tools that I used for developing An Introduction to Applied Bioinformatics are IPython, GitHub and nbviewer. These are all very accessible (the hardest is GitHub, which can have a steep learning curve if you’ve never used a revision control system, but there is a lot of documentation online).

If An Introduction to Applied Bioinformatics is useful to you, either for studying on your own, or teaching courses on bioinformatics, I’d love to hear about it! You can reach me at gregcaporaso@gmail.com, or visit my lab website or teaching website.


Leave a Reply

Greg Caporaso

Greg Caporaso is a professor of bioinformatics at Northern Arizona University.