Effect of sequence depth and length in long-read assembly of the maize inbred NC358

Ou, S. J., Liu, J. N., Chougule, K. M., Fungtammasan, A., Seetharam, A.S., Stein, J. C., Llaca, V., Manchanda, N., Gilbert, A. M., Wei, S. R., Chin, C. S., Hufnagel, D. E., Pedersen, S., Snodgrass, S. J., Fengler, K., Woodhouse, M., Walenz, B. P., Koren, S., Phillippy, A. M., Hannigan, B. T., Dawe, R. K., Hirsch, C. N., Hufford, M. B., Ware, D. (May 2020) Effect of sequence depth and length in long-read assembly of the maize inbred NC358. Nature Communications, 11 (1). p. 2288. ISSN 2041-1723

URL: https://pubmed.ncbi.nlm.nih.gov/32385271/
DOI: 10.1038/s41467-020-16037-7

Abstract

Improvements in long-read data and scaffolding technologies have enabled rapid generation of reference-quality assemblies for complex genomes. Still, an assessment of critical sequence depth and read length is important for allocating limited resources. To this end, we have generated eight assemblies for the complex genome of the maize inbred line NC358 using PacBio datasets ranging from 20 to 75xgenomic depth and with N50 subread lengths of 11-21kb. Assemblies with <= 30xdepth and N50 subread length of 11kb are highly fragmented, with even low-copy genic regions showing degradation at 20xdepth. Distinct sequence-quality thresholds are observed for complete assembly of genes, transposable elements, and highly repetitive genomic features such as telomeres, heterochromatic knobs, and centromeres. In addition, we show high-quality optical maps can dramatically improve contiguity in even our most fragmented base assembly. This study provides a useful resource allocation reference to the community as long-read technologies continue to mature. Sequence depth and read length determine the quality of genome assembly. Here, the authors leverage a set of PacBio reads to develop guidelines for sequencing and assembly of complex plant genomes in order to allocate finite resources using maize as an example.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > DNA expression
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
organism description > plant > maize
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
organism description > plant
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > DNA expression > transposable elements
CSHL Authors:
Communities: CSHL labs > Ware lab
Depositing User: Matthew Dunn
Date: 8 May 2020
Date Deposited: 06 Jul 2020 18:33
Last Modified: 01 Feb 2024 16:43
PMCID: PMC7211024
Related URLs:
URI: https://repository.cshl.edu/id/eprint/39506

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving