Gapless assembly of complete human and plant chromosomes using only nanopore sequencing

Koren, Sergey, Bao, Zhigui, Guarracino, Andrea, Ou, Shujun, Goodwin, Sara, Jenike, Katharine M, Lucas, Julian, McNulty, Brandy, Park, Jimin, Rautiainen, Mikko, Rhie, Arang, Roelofs, Dick, Schneiders, Harrie, Vrijenhoek, Ilse, Nijbroek, Koen, Nordesjo, Olle, Nurk, Sergey, Vella, Mike, Lawrence, Katherine R, Ware, Doreen, Schatz, Michael C, Garrison, Erik, Huang, Sanwen, McCombie, William Richard, Miga, Karen H, Wittenberg, Alexander HJ, Phillippy, Adam M (November 2024) Gapless assembly of complete human and plant chromosomes using only nanopore sequencing. Genome Research. ISSN 1088-9051 (Public Dataset)

[thumbnail of 10.1101.gr.279334.124.pdf] PDF
10.1101.gr.279334.124.pdf - Published Version
Restricted to Repository staff only until 11 May 2025.
Available under License Creative Commons Attribution.

Download (5MB)

Abstract

The combination of ultra-long (UL) Oxford Nanopore Technologies (ONT) sequencing reads with long, accurate Pacific Bioscience (PacBio) High Fidelity (HiFi) reads has enabled the completion of a human genome and spurred similar efforts to complete the genomes of many other species. However, this approach for complete, "telomere-to-telomere" genome assembly relies on multiple sequencing platforms, limiting its accessibility. ONT "Duplex" sequencing reads, where both strands of the DNA are read to improve quality, promise high per-base accuracy. To evaluate this new data type, we generated ONT Duplex data for three widely studied genomes: human HG002, Solanum lycopersicum Heinz 1706 (tomato), and Zea mays B73 (maize). For the diploid, heterozygous HG002 genome, we also used "Pore-C" chromatin contact mapping to completely phase the haplotypes. We found the accuracy of Duplex data to be similar to HiFi sequencing, but with read lengths tens of kilobases longer, and the Pore-C data to be compatible with existing diploid assembly algorithms. This combination of read length and accuracy enables the construction of a high-quality initial assembly, which can then be further resolved using the UL reads, and finally phased into chromosome-scale haplotypes with Pore-C. The resulting assemblies have a base accuracy exceeding 99.999% (Q50) and near-perfect continuity, with most chromosomes assembled as single contigs. We conclude that ONT sequencing is a viable alternative to HiFi sequencing for de novo genome assembly, and provides a multirun single-instrument solution for the reconstruction of complete genomes.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
CSHL Authors:
Communities: CSHL Cancer Center Program > Cancer Genetics and Genomics Program
CSHL labs > McCombie lab
CSHL labs > Ware lab
CSHL labs > Goodwin lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 6 November 2024
Date Deposited: 11 Nov 2024 13:46
Last Modified: 11 Nov 2024 13:46
Related URLs:
Dataset ID:
URI: https://repository.cshl.edu/id/eprint/41730

Actions (login required)

Administrator's edit/view item Administrator's edit/view item