Familial long-read sequencing increases yield of de novo mutations

Noyes, Michelle D, Harvey, William T, Porubsky, David, Sulovari, Arvis, Li, Ruiyang, Rose, Nicholas R, Audano, Peter A, Munson, Katherine M, Lewis, Alexandra P, Hoekzema, Kendra, Mantere, Tuomo, Graves-Lindsay, Tina A, Sanders, Ashley D, Goodwin, Sara, Kramer, Melissa, Mokrab, Younes, Zody, Michael C, Hoischen, Alexander, Korbel, Jan O, McCombie, W Richard, Eichler, Evan E (March 2022) Familial long-read sequencing increases yield of de novo mutations. American Journal of Human Genetics. ISSN 0002-9297

URL: https://www.ncbi.nlm.nih.gov/pubmed/35290762

DOI: 10.1016/j.ajhg.2022.02.014

Abstract

Studies of de novo mutation (DNM) have typically excluded some of the most repetitive and complex regions of the genome because these regions cannot be unambiguously mapped with short-read sequencing data. To better understand the genome-wide pattern of DNM, we generated long-read sequence data from an autism parent-child quad with an affected female where no pathogenic variant had been discovered in short-read Illumina sequence data. We deeply sequenced all four individuals by using three sequencing platforms (Illumina, Oxford Nanopore, and Pacific Biosciences) and three complementary technologies (Strand-seq, optical mapping, and 10X Genomics). Using long-read sequencing, we initially discovered and validated 171 DNMs across two children-a 20% increase in the number of de novo single-nucleotide variants (SNVs) and indels when compared to short-read callsets. The number of DNMs further increased by 5% when considering a more complete human reference (T2T-CHM13) because of the recovery of events in regions absent from GRCh38 (e.g., three DNMs in heterochromatic satellites). In total, we validated 195 de novo germline mutations and 23 potential post-zygotic mosaic mutations across both children; the overall true substitution rate based on this integrated callset is at least 1.41 × 10-8 substitutions per nucleotide per generation. We also identified six de novo insertions and deletions in tandem repeats, two of which represent structural variants. We demonstrate that long-read sequencing and assembly, especially when combined with a more complete reference genome, increases the number of DNMs by >25% compared to previous studies, providing a more complete catalog of DNM compared to short-read data alone.

Item Type:	Paper
Subjects:	bioinformatics bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification bioinformatics > genomics and proteomics > genetics & nucleic acid processing bioinformatics > genomics and proteomics Investigative techniques and equipment organism description > animal Investigative techniques and equipment > assays bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > de novo mutation organism description > animal > gender > female organism description > animal > gender Investigative techniques and equipment > assays > long-read sequencing bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > mutations
CSHL Authors:	Goodwin, Sara Kramer, Melissa R. McCombie, W. Richard
Communities:	CSHL Cancer Center Program CSHL Cancer Center Program > Cancer Genetics and Genomics Program CSHL Cancer Center Shared Resources > Next Generation Sequencing Service CSHL Cancer Center Shared Resources > Sequencing Technology & Analysis Service CSHL labs > McCombie lab
SWORD Depositor:	CSHL Elements
Depositing User:	CSHL Elements
Date:	9 March 2022
Date Deposited:	17 Mar 2022 15:08
Last Modified:	02 May 2024 19:18
PMCID:	PMC9069071
URI:	https://repository.cshl.edu/id/eprint/40553

Actions (login required)

Administrator's edit/view item