Error correction and assembly complexity of single molecule sequencing reads

Lee, Hayan, Gurtowski, James, Yoo, Shinjae, Marcus, Shoshana, McCombie, W Richard, Schatz, Michael (June 2014) Error correction and assembly complexity of single molecule sequencing reads. bioRxiv. ISSN 2692-8205 (Submitted)

[thumbnail of 10.1101.006395.pdf] PDF
10.1101.006395.pdf - Submitted Version
Available under License Creative Commons Attribution Non-commercial.

Download (1MB)

Abstract

Third generation single molecule sequencing technology is poised to revolutionize genomics by enabling the sequencing of long, individual molecules of DNA and RNA. These technologies now routinely produce reads exceeding 5,000 basepairs, and can achieve reads as long as 50,000 basepairs. Here we evaluate the limits of single molecule sequencing by assessing the impact of long read sequencing in the assembly of the human genome and 25 other important genomes across the tree of life. From this, we develop a new data-driven model using support vector regression that can accurately predict assembly performance. We also present a novel hybrid error correction algorithm for long PacBio sequencing reads that uses pre-assembled Illumina sequences for the error correction. We apply it several prokaryotic and eukaryotic genomes, and show it can achieve near-perfect assemblies of small genomes (< 100Mbp) and substantially improved assemblies of larger ones. All source code and the assembly model are available open-source.

Item Type: Paper
Subjects: bioinformatics > genomics and proteomics > analysis and processing
bioinformatics
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > analysis and processing > Sequence Data Processing
CSHL Authors:
Communities: CSHL labs > McCombie lab
CSHL labs > Schatz lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 18 June 2014
Date Deposited: 24 Mar 2026 18:38
Last Modified: 24 Mar 2026 18:38
Related URLs:
URI: https://repository.cshl.edu/id/eprint/42111

Actions (login required)

Administrator's edit/view item Administrator's edit/view item