Kundeti, V. K., Rajasekaran, S., Dinh, H., Vaughn, M. W., Thapar, V.
(November 2010)
*Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.*
BMC Bioinformatics, 11.
p. 560.

Preview |
PDF (Paper)
Efficient_parallel_and_out_of_core_algorithms_for_constructing_large_bi-directed_de_Bruijn_graphs.pdf Download (187kB) |

## Abstract

Background: Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Î˜(nÎ£) messages (Î£ being the size of the alphabet).Results: In this paper we present a Î˜(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Î£. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Î˜(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster - both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem.Conclusions: The bi-directed de Bruijn graph is a fundamental data structure for any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bi-directed de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bi-directed de Bruijn graphs. Furthermore, our algorithms do not employ any all-to-all communications in a parallel setting and perform better than the prior algorithms. Finally our out-of-core algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET. Â© 2010 Kundeti et al; licensee BioMed Central Ltd.

Item Type: | Paper |
---|---|

Subjects: | bioinformatics > genomics and proteomics > annotation > sequence annotation bioinformatics > genomics and proteomics > analysis and processing > Sequence Data Processing bioinformatics > genomics and proteomics > Mapping and Rendering > Sequence Rendering bioinformatics > computational biology |

CSHL Authors: | |

Communities: | CSHL labs > Martienssen lab |

Depositing User: | CSHL Librarian |

Date: | 15 November 2010 |

Date Deposited: | 19 Oct 2011 17:46 |

Last Modified: | 08 Mar 2018 17:05 |

PMCID: | PMC2996408 |

Related URLs: | |

URI: | https://repository.cshl.edu/id/eprint/15460 |

### Actions (login required)

Administrator's edit/view item |