Highly accurate long-read HiFi sequencing data for five complex genomes

Hon, T., Mars, K., Young, G., Tsai, Y. C., Karalius, J. W., Landolin, J. M., Maurer, N., Kudrna, D., Hardigan, M. A., Steiner, C. C., Knapp, S. J., Ware, D., Shapiro, B., Peluso, P., Rank, D. R. (November 2020) Highly accurate long-read HiFi sequencing data for five complex genomes. Sci Data, 7 (1). p. 399. ISSN 2052-4463

URL: https://pubmed.ncbi.nlm.nih.gov/33203859/
DOI: 10.1038/s41597-020-00743-4


The PacBio(®) HiFi sequencing method yields highly accurate long-read sequencing datasets with read lengths averaging 10-25 kb and accuracies greater than 99.5%. These accurate long reads can be used to improve results for complex applications such as single nucleotide and structural variant detection, genome assembly, assembly of difficult polyploid or highly repetitive genomes, and assembly of metagenomes. Currently, there is a need for sample data sets to both evaluate the benefits of these long accurate reads as well as for development of bioinformatic tools including genome assemblers, variant callers, and haplotyping algorithms. We present deep coverage HiFi datasets for five complex samples including the two inbred model genomes Mus musculus and Zea mays, as well as two complex genomes, octoploid Fragaria × ananassa and the diploid anuran Rana muscosa. Additionally, we release sequence data from a mock metagenome community. The datasets reported here can be used without restriction to develop new algorithms and explore complex genome structure and evolution. Data were generated on the PacBio Sequel II System.

Item Type: Paper
Additional Information: 2052-4463 Hon, Ting Mars, Kristin Young, Greg Tsai, Yu-Chih Orcid: 0000-0002-2958-0278 Karalius, Joseph W Orcid: 0000-0003-3592-1339 Landolin, Jane M Maurer, Nicholas Kudrna, David Hardigan, Michael A Steiner, Cynthia C Knapp, Steven J Orcid: 0000-0001-6498-5409 Ware, Doreen Orcid: 0000-0002-8125-3821 Shapiro, Beth Orcid: 0000-0002-2733-7776 Peluso, Paul Rank, David R Orcid: 0000-0001-9213-6965 2017-51181-26833/United States Department of Agriculture | National Institute of Food and Agriculture (NIFA)/International 8062-21000-041/United States Department of Agriculture | Agricultural Research Service (USDA Agricultural Research Service)/International IOS-1744001/National Science Foundation (NSF)/International Journal Article Research Support, Non-U.S. Gov't Research Support, U.S. Gov't, Non-P.H.S. Sci Data. 2020 Nov 17;7(1):399. doi: 10.1038/s41597-020-00743-4.
Subjects: bioinformatics > genomics and proteomics > analysis and processing
bioinformatics > genomics and proteomics
organism description > plant > maize
organism description > animal
organism description > animal > mammal
organism description > animal > mammal > rodent > mouse
organism description > plant
organism description > animal > mammal > rodent
CSHL Authors:
Communities: CSHL labs > Ware lab
Depositing User: Matthew Dunn
Date: 17 November 2020
Date Deposited: 19 Apr 2021 17:33
Last Modified: 30 Jan 2024 20:51
PMCID: PMC7673114
Related URLs:
URI: https://repository.cshl.edu/id/eprint/39862

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving