91. Lazer: A Memory-Efficient Framework for Large-Scale Genome Assembly
Authors: Sayan Goswami (Louisiana State University)Arghya Kusum Das (Louisiana State University)Richard Platania (Louisiana State University)Kisung Lee (Louisiana State University)Seung-Jong Park (Louisiana State University)
Best Poster Finalist
Abstract: Genome sequencing technology has witnessed tremendous progress both in terms of throughput as well as the cost per base pair. However, when it comes to sequence assembly, there still exists a dilemma in the state-of-the-art technology. On one hand, we have a number of distributed assemblers that can utilize several nodes but require massive amounts of memory. On the other hand, there are a few assemblers that can assemble mammalian genomes on a single node but cannot scale up. In this paper, we present a distributed assembler that is both scalable and memory efficient. Using partitioned De Bruijn graphs we enhance memory-to-disk swapping and reduce network communication in the cluster. Experimental results show that our framework can assemble a human genome dataset (452GB) in 14.5 hours using two nodes and 23 minutes using 128 nodes.
Two-page extended abstract: pdf