ANGUS: Analyzing High Throughput Sequencing Data

June 26 - July 8, 2017

2016 materials:

This intensive two week summer course will introduce attendees with a strong biology background to the practice of analyzing high-throughput sequencing data (Illumina, PacBio, and Nanopore). The first week will introduce students to computational thinking and large-scale data analysis on UNIX platforms. The second week will focus on genome and transcriptome assembly, transcript quantitation, variant calling, and other topics.

No prior programming experience is required, although familiarity with some programming concepts will be helpful, and bravery in the face of the unknown is necessary. A year or more of graduate school in a biological science is strongly suggested. Faculty, postdocs, and research staff are more than welcome, as are researchers from industry.

Note that this year we have much more room for attendees!

A draft schedule of hours for this year is available.

We plan to run multiple workshops of 20-30 participants each.

What will I learn if I attend?

Our goal for these two weeks is to get students to the point where they are ready to begin analyzing their own data on a computer cluster, and can work with help forums and online tutorials to advance their own skills.

Students will gain practical experience in:

  • Python and bash shell scripting
  • Cloud computing/Amazon EC2
  • Basic software installation on UNIX
  • Installing and running Trinity, BWA, Salmon, SPAdes, ABySS, Prokka and other bioinformatics tools.
  • Querying mappings and evaluating assemblies
  • Materials from previous courses are available at under a Creative Commons/full use+reuse license.

You can read a blog post about the 2015 course here:

Applications for housing are closed, but we may still have space in the workshop; please contact us if you are interested in attending.

The course fee will be $500 for this workshop.

Computer requirements

You will need to bring a computer that can connect to wifi, and you should have a modern browser (Google Chrome or Safari or Firefox) installed. No specific operating system is required.

We will use XSEDE Jetstream academic cloud computing to execute data analysis for the workshop; all analysis will be done remotely.


This workshop was run at Michigan State University’s Kellogg Biological Station from 2010 to 2016, with support from the USDA and NIH (see Funders). Dr. Brown is the founding course director and ran the workshop from 2010-2015; Dr. Staton (UTK) and Dr. MacManes (UNH) were the 2016 course directors.

There are now almost 200 alumni of the first 7 years!

With Dr. Brown’s move to UC Davis, the workshop has expanded to serve more learners, and to include more activities.

If you have questions, please contact us at via e-mail at