Kundeti, V. K., Rajasekaran, S., Dinh, H., Vaughn, M. W., Thapar, V. (November 2010) Efficient parallel and out of core algorithms for constructing large bidirected de Bruijn graphs. BMC Bioinformatics, 11. p. 560.

PDF (Paper)
Efficient_parallel_and_out_of_core_algorithms_for_constructing_large_bidirected_de_Bruijn_graphs.pdf Download (183Kb) 
Abstract
Background: Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories  based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bidirected edges which can overlap with a node and ends up generating Î˜(nÎ£) messages (Î£ being the size of the alphabet).Results: In this paper we present a Î˜(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Î£. The generality of our algorithm makes it very easy to extend it even to the outofcore model and in this case it has an optimal I/O complexity of Î˜(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster  both asymptotically and practically. We demonstrate the scalability of our sequential outofcore algorithm by comparing it with the algorithm used by VELVET to build the bidirected de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bidirected chain compaction problem.Conclusions: The bidirected de Bruijn graph is a fundamental data structure for any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bidirected de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bidirected de Bruijn graphs. Furthermore, our algorithms do not employ any alltoall communications in a parallel setting and perform better than the prior algorithms. Finally our outofcore algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET. Â© 2010 Kundeti et al; licensee BioMed Central Ltd.
Item Type:  Paper 

Subjects:  bioinformatics > genomics and proteomics > annotation > sequence annotation bioinformatics > genomics and proteomics > analysis and processing > Sequence Data Processing bioinformatics > genomics and proteomics > Mapping and Rendering > Sequence Rendering bioinformatics > computational biology 
CSHL Authors:  
Communities:  CSHL labs > Martienssen lab 
Depositing User:  CSHL Librarian 
Date:  15 November 2010 
Date Deposited:  19 Oct 2011 17:46 
Last Modified:  08 Mar 2018 17:05 
PMCID:  PMC2996408 
Related URLs:  
URI:  http://repository.cshl.edu/id/eprint/15460 
Actions (login required)
Administrator's edit/view item 