One-week workshops at DIBSI 2017

We are running nine topic-specific workshops this summer! There will be two weeks of workshops - July 10-14, and July 17-21. See below for more information.

Workshop bubbles

Basic workshop information

All workshops will take place at UC Davis; please see the venue information for details.

Workshops may extend into the evening hours; please plan on devoting the entire time to the workshop. Workshops are $350/wk.

On-campus housing information is available for approximately $400/wk, which includes breakfast and dinner. Housing registration currently closes April 26th.

Registration links for each workshop are under the workshop description; housing is linked there as well, and must be booked separately. Attendees of both weeks of workshops may book housing for both weeks, and attendees of the two-week introductory bioinformatics workshop, ANGUS may book a full four weeks of housing.

For questions about registration, travel, invitation letters, or other general topics, please contact For workshop specific questions, contact the instructors (e-mail links are under each workshop).

Week 1: July 10-14

Week 2: July 17-21

Week 1 – July 10-14.

These workshops will start on Monday, July 10th at 9am, and finish by Friday, July 14th, at 5pm. On campus housing is available Sunday through Saturday.

Genome Wide Association Study Workshop

Dates: July 10-14

Instructors: Tamer Mansour, Erica Scott

Genome wide association study is a trending approach in modern genetics. However, for many years, GWAS was limited by the coverage and sensitivity of SNP arrays. In the era of NGS, whole genome sequencing is liberating GWAS from these limitations and is now providing new solutions for chronic problems like pedigree validation and population stratification. In this course we will use the GATK pipeline for variant calling on WGS from several dogs, and we will then use the PLINK software for GWAS analysis to identify one of the important coat color genes. Basic bash scripting skills and a reasonable understanding of genetic association studies are required.

Undergraduate Curriculum Hackathon

Dates: July 10-14

Organizers: David Still, Andreas Madlung, Amy Runck, Phillip Brooks, Karen Word, Sara Edge, Lisa Cohen, Jessica Mizzi, Alexandra Colón-Rodríguez, Jon Badalementi

Contact: Karen Word

Do you wish there was an undergrad-friendly version of your favorite part of the two-week intro bioinformatics workshop, ANGUS? Help us make one! We’re looking to bring together data experts with teaching interests together with teaching experts with data interests for a week of collaborative conversion of these materials into smoothed-out tutorials for use in undergraduate classrooms. Depending on the number of people attending and the interest they bring, we will work on one or more of the following topic areas: Genome assembly, RNAseq analysis and/or 16s rRNA microbial community analysis.

Attendees should have some familiarity with the NGS workshop materials (perfect for those who have just taken it!) and attendees with professional expertise in the topic areas are particularly welcome. We will provide basic training in the use of GitHub for collaborative work. Funding options and strategies to broaden opportunities for bioinformatics in undergraduate settings will be discussed.

Introduction to Python

Dates: July 10-14

Instructor: Emily Dolson

This workshop will introduce students to the general-purpose programming language, Python. Attendees will be researchers with problems that could be solved with programming, such as simple automated text-mining tasks, visualization of complex data, or pipeline scripting across a large-scale data set. As time permits, the Python scientific ecosystem (pandas, numpy, scipy, seaborn, matplotlib, etc) will be introduced to get learners up to speed on the ins and outs of using the tools that are currently most popular.

Before the workshop begins, each learner may identify a problem that they would like to be able to solve with programming and run it by the instructor: Emily Dolson (Michigan State University),, who will then focus the workshop around teaching the appropriate skills and coming up with challenge problems to meet the needs of the attendees.

Reproducible research with R/Data Hackathon

Dates: July 10-14

Instructors: Chris Hamm

This workshop will be a hybrid between a two-day Software Carpentry style workshop followed by a hack-a-thon. In this workshop participants will learn the fundamentals of employing reproducible research in the R language, followed by three days of instructor-led group working sessions where participants will employ the skills just acquired towards their own research projects.

Target audience: everyone is welcome: grad students, post-docs, faculty, everyone. Ideally, participants will have a working knowledge of R and a research project with some analyses ready to go.

At the conclusion of this workshop, participants will be able to:

  • Describe reproducible research.
  • Characterize the benefits of using knitr and RStudio for reproducible research.
  • Explain importance of version control
  • Apply version control to research project using RStudio
  • Explain the importance of consistent file names and proper dates.
  • Describe a directory organization and work flow that lends itself to reproducible research.
  • Explain why documenting and commenting is important for reproducible research.
  • Demonstrate use of knitr and Rmarkdown to integrate code and text.
  • Describe what automation is.
  • Recognize circumstances where automation would be beneficial.
  • Summarize the benefits of publishing code and data
  • Describe commonly used software licenses

Cloud Training Materials Development

Dates: July 10-14

Organizers: Daniel Standage, Luiz Irber

Contact: Daniel Standage

The demand for skills in cloud computing has steadily grown in recent years as data collection and computing needs outstrip campus computing capacity. For a researcher making their first foray into cloud computing, it can be daunting to navigate the available options for generic computing infrastructure (AWS, Google Compute, Jetstream, etc), software application configuration (pre-configured VMs, Docker containers, etc), data storage/archival, workflow execution, and domain-specific platforms (Seven Bridges, DNA Nexus, etc). The lack of training resources in this area presents a significant opportunity to do it right. As part of the DIBSI Summer Institute, we are running a 1-week workshop (July 10th-14th) to develop training materials for these topics. Motivated by a common genomics use cases, we will brainstorm to identify the critical competencies needed to make informed decisions about computing resources for data analysis in the cloud. The key deliverable of the workshop will be a set of training materials that we will pilot in a cloud computing workshop in the following months.

Week 2 - July 17-21

These workshops will start on Monday, July 17th at 9am, and finish by Friday, July 21st, at noon.

Note that on campus housing is available from Sunday, July 16th, through July 21st.

Introduction to Transposon Insertion Sequencing Analysis

Dates: July 17-21

Instructor: Mark Mandel

Transposon insertion sequencing is a method to conduct high-throughput forward genetic experiments in bacterial mutant populations. Most of the workshop will focus on experimental design and hands-on data analysis using the Insertion Sequencing (INSeq) technology described by Goodman et al (Cell Host & Microbe, 2009; Nature Protocols, 2011). We will briefly discuss other variations in mutant library generation and in data analysis workflows.

This workshop is intended for beginners, although it would be helpful for participants to have basic familiarity with command-line usage. Material will cover the history of transposon sequencing as well as its role in modern genetic approaches. Attendees will learn how to use the unix command line to install and run a python software package on their computer and on remote computers. Sample data will be provided for attendees to demultiplex the transposon sequencing samples, map the data to the genome, and analyze transposon library dynamics across samples. Though not required, if you have your own data there will likely be time to begin to analyze it during the workshop.

Environmental Metagenomics (DIBSI-EM)

Dates: July 17-21

Instructor: Harriet Alexander

Microorganisms live in complex mixed communities, and many of them cannot be cultured. Metagenomics, or the untargeted (whole metagenome) sequencing of genetic material (DNA) from the environment, provides a means of assessing the genetic diversity and functional potential of these organisms, whilst eliminating the need for isolating these difficult to culture organisms.

We will be offering a five day workshop on Environmental Metagenomics (July 17-21) as part of DIBSI 2017. This workshop is geared towards those new to metagenomic analyses, but who have data in-hand, as well as those interested in gaining a better understanding of some of the approaches and learning new techniques. The workshop will be broken into two main parts. The first two (three) days will focus on introducing and familiarizing participants with analytical tools and pipelines common to metagenomics through a series of hands-on practical tutorials using a practice dataset. Topics covered will include: short-read quality control and trimming, assembly, binning, annotation, abundance estimation, and data visualization. The second two (three) days will offer participants the opportunity to apply the topics covered during the first two days of the workshop to their own data with the support of other participants and the instructors.

This workshop will not cover 16s data analysis.

Non-model RNAseq, bring your own data

Dates: July 17-21

Instructors: Tessa Pierce, Jane Khudyakov, Lisa Cohen

Contact: Lisa Cohen

The focus of this hands-on tutorial will be RNAseq de novo assembly and quantification. It is intended for participants with Illumina poly(A) selected RNA sequencing data from a non-model organism with no closely-related reference genome who would like assistance analyzing and learning more about the software tools commonly used in this type of analysis. Time will be spent working on data brought by attendees with the idea to get through all steps of a typical pipeline workflow. We will provide scripts, example sets of data to work with, and cloud computing resources. Attendees should already have some familiarity with using command line software tools and beginning-level next-generation sequence analysis materials (see This workshop is ideal for alumni of previous years of the ANGUS workshop at MSU Kellogg Biological Station or attendees from this year’s 2-week workshop at UC Davis. We will provide basic training in the use of GitHub for collaborative work. If you would like assistance adapting our materials to run on the computing resources at your home institution, please let us know.

For example materials, see

Introduction to R

Dates: July 17-21

Instructor: Michael Koontz

Join us for an interactive, week-long introduction to the programming language R!

R is a powerful, cross-platform, open-source, and free software that has been widely adopted across a number of science fields. While incredibly useful, it can also be daunting to learn. This course doesn’t require any prior programming experience. We’ll teach you the basics of R by writing code together and setting up our computers the same way you will to work on your own data after the workshop. By the end of the week, you’ll be able to input, organize, and summarize data in R. You’ll also learn how to visualize and present data using publication-quality plots and dynamic documents that combine descriptive writing with the results of your code.

The course will focus on laying a groundwork of basic R skills to enable future self-teaching of specific use cases. However, enrollees are encouraged to reach out to the instructor if there are particular topics that they think would be especially valuable to cover, and we’ll try to work them into the curriculum.

Housing registration

If you have questions, please contact us at via e-mail at