OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds

Wu, J., Anczukow, O., Krainer, A. R., Zhang, M. Q., Zhang, C. (2013) OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds. Nucleic Acids Research, 41 (10). pp. 5149-5163. ISSN 03051048 (ISSN)

[thumbnail of Paper]
Preview
PDF (Paper)
Krainer Nucleic Acids Research 2013.pdf - Published Version

Download (1MB) | Preview
URL: http://www.ncbi.nlm.nih.gov/pubmed/23571760
DOI: 10.1093/nar/gkt216

Abstract

A crucial step in analyzing mRNA-Seq data is to accurately and efficiently map hundreds of millions of reads to the reference genome and exon junctions. Here we present OLego, an algorithm specifically designed for de novo mapping of spliced mRNA-Seq reads. OLego adopts a multiple-seed-and-extend scheme, and does not rely on a separate external aligner. It achieves high sensitivity of junction detection by strategic searches with small seeds ( approximately 14 nt for mammalian genomes). To improve accuracy and resolve ambiguous mapping at junctions, OLego uses a built-in statistical model to score exon junctions by splice-site strength and intron size. Burrows-Wheeler transform is used in multiple steps of the algorithm to efficiently map seeds, locate junctions and identify small exons. OLego is implemented in C++ with fully multithreaded execution, and allows fast processing of large-scale data. We systematically evaluated the performance of OLego in comparison with published tools using both simulated and real data. OLego demonstrated better sensitivity, higher or comparable accuracy and substantially improved speed. OLego also identified hundreds of novel micro-exons (<30 nt) in the mouse transcriptome, many of which are phylogenetically conserved and can be validated experimentally in vivo. OLego is freely available at http://zhanglab.c2b2.columbia.edu/index.php/OLego.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > annotation > map annotation
bioinformatics > genomics and proteomics > computers > computer software
CSHL Authors:
Communities: CSHL Post Doctoral Fellows
CSHL labs > Krainer lab
CSHL Cancer Center Program > Gene Regulation and Cell Proliferation
Depositing User: Matt Covey
Date: 2013
Date Deposited: 22 May 2013 19:31
Last Modified: 13 Oct 2015 18:40
PMCID: PMC3664805
Related URLs:
URI: https://repository.cshl.edu/id/eprint/28311

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving