Abstract:
Large datasets have become routine in biology. However, performing a
computational analysis of a large dataset can be overwhelming,
especially for novices. From June 18 to July 21, 2017 (30 days), the
Lab for Data Intensive Biology will be running several different
computational training events at the University of California, Davis
for 100 people and 25 instructors. In addition, there will be a
week-long instructor training in how to reuse our materials, and
focused workshops, such as: GWAS for veterinary animals, shotgun
environmental -omics, binder, non-model RNAseq, introduction to
Python, and lesson development for undergraduates. The materials for
the workshop were previously developed and tested by approximately 200
students on Amazon Web Services cloud compute services at Michigan
State University's Kellogg Biological Station from 2010 and 2016, with
support from the USDA and NIH. Materials are and will continue to be
CC-BY, with scripts and associated code under BSD; the material will
be adapted for Jetstream cloud usage and made available for future
use.
Keywords: Sequencing, Bioinformatics, Training
Principal investigator: C. Titus Brown
Field of science: Genomics
Resource Justification:
We are requesting 100 m.medium instances with 6 cores, 16 GB RAM, and
130 GB VM space each for each instructor and student for 4 weeks. The
total request is for 432,000 service units (6 cores * 24 hrs/day * 30
days * 100 people). To accommodate large size data files, an
additional 100 GB of storage volumes are requested for each
person. Persistent storage beyond the duration is not necessary for
this training workshop.
These calculations are based on running the course for seven years
with approximately 200 students total over the past six years on AWS
cloud services.
Syllabus:
http://ivory.idyll.org/dibsi/
http://angus.readthedocs.io/en/2016/
Resources: IU/TACC (Jetstream)