A complete reference genome improves analysis of human genetic variation

Aganezov, Sergey, Yan, Stephanie M, Soto, Daniela C, Kirsche, Melanie, Zarate, Samantha, Avdeyev, Pavel, Taylor, Dylan J, Shafin, Kishwar, Shumate, Alaina, Xiao, Chunlin, Wagner, Justin, McDaniel, Jennifer, Olson, Nathan D, Sauria, Michael EG, Vollger, Mitchell R, Rhie, Arang, Meredith, Melissa, Martin, Skylar, Lee, Joyce, Koren, Sergey, Rosenfeld, Jeffrey A, Paten, Benedict, Layer, Ryan, Chin, Chen-Shan, Sedlazeck, Fritz J, Hansen, Nancy F, Miller, Danny E, Phillippy, Adam M, Miga, Karen H, McCoy, Rajiv C, Dennis, Megan Y, Zook, Justin M, Schatz, Michael C (April 2022) A complete reference genome improves analysis of human genetic variation. Science, 376 (6588). eabl3533. ISSN 0036-8075

Abstract

Compared to its predecessors, the Telomere-to-Telomere CHM13 genome adds nearly 200 million base pairs of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for clinical and functional study. We show how this reference universally improves read mapping and variant calling for 3202 and 17 globally diverse samples sequenced with short and long reads, respectively. We identify hundreds of thousands of variants per sample in previously unresolved regions, showcasing the promise of the T2T-CHM13 reference for evolutionary and biomedical discovery. Simultaneously, this reference eliminates tens of thousands of spurious variants per sample, including reduction of false positives in 269 medically relevant genes by up to a factor of 12. Because of these improvements in variant discovery coupled with population and functional genomic resources, T2T-CHM13 is positioned to replace GRCh38 as the prevailing reference for human genetics.

Item Type: Paper
Subjects: bioinformatics > genomics and proteomics > analysis and processing
bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
Investigative techniques and equipment
Investigative techniques and equipment > assays
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
organism description > animal > mammal > primates > hominids > human
Investigative techniques and equipment > assays > whole genome sequencing
CSHL Authors:
Communities: CSHL labs > Schatz lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 1 April 2022
Date Deposited: 04 Apr 2022 14:35
Last Modified: 11 Jan 2024 15:27
PMCID: PMC9336181
URI: https://repository.cshl.edu/id/eprint/40563

Actions (login required)

Administrator's edit/view item Administrator's edit/view item