(Please contact us at firstname.lastname@example.org if you are interested in access to any of this data. We're still working out how and when to make it public.)
The tule elk (Cervus elaphus nannodes) is a California-endemic subspecies that underwent a major genetic bottleneck when its numbers were reduced to as few as 3 individuals in the 1870s (McCullough 1969; Meredith et al. 2007). Since then, the population has grown to an estimated 4,300 individuals which currently occur in 22 distinct herds (Hobbs 2014). Despite their higher numbers today, the historical loss of genetic diversity combined with the increasing fragmentation of remaining habitat pose a significant threat to the health and management of contemporary populations. As populations become increasingly fragmented by highways, reservoirs, and other forms of human development, risks intensify for genetic impacts associated with inbreeding. By some estimates, up to 44% of remaining genetic variation could be lost in small isolated herds in just a few generations (Williams et al. 2004). For this reason, the Draft Elk Conservation and Management Plan and California Wildlife Action Plan prioritize research aimed at facilitating habitat connectivity, as well as stemming genetic diversity loss and habitat fragmentation (Hobbs 2014; CDFW 2015).
We obtained 377,980,276 raw reads (i.e., 300 bp sequences from random points in the genome), containing a total of 113.394 Gbp of sequence, or approximately 40X coverage of the tule elk genome. More than 98% of these data passed quality filtering. The reads (and coverage) were distributed approximately equally among the 4 elk, resulting in approximately 10X coverage for each of the 4 elk.
The tule elk reads were de novo assembled into 602,862 contiguous sequences ("contigs") averaging 3,973 bp in length (N50 = 6,885 bp, maximum contig length = 72,391 bp), for a total genome sequence size of 2.395 billion bp (Gbp). All scaffolds and raw reads will be made publicly available on Genbank or a similar public database pending publication. Alignment of all elk reads back to these contigs revealed 3,571,069 polymorphic sites (0.15% of sites). Assuming a similar ratio of heterozygous (in individuals) to polymorphic (among the 4 elk) sites as we observed in the subsample aligned to the sheep genome, this would translate to a genome-wide heterozygosity of approximately 5e-4, which was about 5 times higher than that observed in the 25% of the genome mapping to the sheep genome. This magnitude of heterozygosity is in line with other bottlenecked mammal populations, including several of the island foxes (Urocyon littoralis), cheetah (Acinonyx jubatus), Tasmanian devil (Sarcophilus harrisii), and mountain gorilla (Gorilla beringei beringei; Robinson et al. 2016 and references therein). Although these interspecific comparisons provide a general reference, heterozygosity can vary substantially according to life-history, as well as demographic history, and does not necessarily imply a direct relationship to genetic health. Therefore, sequencing the closely related Rocky Mountain (C. elaphus nelsoni) and Roosevelt (C. elaphus roosevelti) elk in the future is necessary to provide the most meaningful comparison to the tule elk heterozygosity reported here.
Note, assembly method details are available on github.