91. Lazer: A Memory-Efficient Framework for Large-Scale Genome Assembly
Event Type
Poster
LocationLower Lobby Concourse
DescriptionGenome sequencing technology has witnessed tremendous progress both in terms of throughput as well as the cost per base pair. However, when it comes to sequence assembly, there still exists a dilemma in the state-of-the-art technology. On one hand, we have a number of distributed assemblers that can utilize several nodes but require massive amounts of memory. On the other hand, there are a few assemblers that can assemble mammalian genomes on a single node but cannot scale up. In this paper, we present a distributed assembler that is both scalable and memory efficient. Using partitioned De Bruijn graphs we enhance memory-to-disk swapping and reduce network communication in the cluster. Experimental results show that our framework can assemble a human genome dataset (452GB) in 14.5 hours using two nodes and 23 minutes using 128 nodes.
Archive








