Presentation

· Presenter IndexPresenters · Organization IndexOrganizations · Search Program · Flagged · Happening Now · QRCode Reader

Paper

: A Parallel Algorithm for Finding All Pairs k-Mismatch Maximal Common Substrings

ask a question

give feedback

SessionCombinatorial and Multigrid Algorithms

Session ChairJed Brown

Authors

Sriram P. Chockalingam

Sharma V. Thankachan

Srinivas Aluru

Event Type

Paper

Event Tags

Algorithms

Intermediate

Introductory

TimeThursday, November 17th2pm - 2:30pm

Location355-E

DescriptionWe present an efficient parallel algorithm for the following problem: Given an input collection D of n strings of total length N, a length threshold H and a mismatch threshold k, report all k-mismatch maximal common substrings of length at least H over all pairs of strings in D. This problem is motivated by various applications in computational biology, where D is a collection of millions of short DNA sequences. Sequencing errors and massive size of these datasets necessitate efficient parallel approximate sequence matching algorithms. We present a novel distributed memory parallel algorithm that solves this approximate sequence matching problem in O((N/p + occ) log^{k} N) expected time and takes O(log^{k+1} N) expected rounds of global communications, under some realistic assumptions, where p is the number of processors and occ is the output size. We demonstrate the performance and scalability of our algorithm using large high throughput sequencing data sets.

Download PDF

Paper provided by the IEEE Computer Society
Paper also available from the ACM Digital Library

Authors

Sriram P. Chockalingam (presenting)

Indian Institute of Technology Bombay

Sharma V. Thankachan

Georgia Institute of Technology

Srinivas Aluru

Georgia Institute of Technology

Navigation