Targeted de novo phasing and long-range assembly by template mutagenesis

Li, Siran, Park, Sarah, Ye, Catherine, Danyko, Cassidy, Wroten, Matthew, Andrews, Peter, Wigler, Michael, Levy, Dan (July 2022) Targeted de novo phasing and long-range assembly by template mutagenesis. Nucleic Acids Research. ISSN 0305-1048

[thumbnail of 2022-Li-Siran-Targeted-de-novo-Phasing-Long-Range-Assembly.pdf] PDF
2022-Li-Siran-Targeted-de-novo-Phasing-Long-Range-Assembly.pdf
Available under License Creative Commons Attribution.

Download (1MB)

Abstract

Short-read sequencers provide highly accurate reads at very low cost. Unfortunately, short reads are often inadequate for important applications such as assembly in complex regions or phasing across distant heterozygous sites. In this study, we describe novel bench protocols and algorithms to obtain haplotype-phased sequence assemblies with ultra-low error for regions 10 kb and longer using short reads only. We accomplish this by imprinting each template strand from a target region with a dense and unique mutation pattern. The mutation process randomly and independently converts ∼50% of cytosines to uracils. Sequencing libraries are made from both mutated and unmutated templates. Using de Bruijn graphs and paired-end read information, we assemble each mutated template and use the unmutated library to correct the mutated bases. Templates are partitioned into two or more haplotypes, and the final haplotypes are assembled and corrected for residual template mutations and PCR errors. With sufficient template coverage, the final assemblies have per-base error rates below 10-9. We demonstrate this method on a four-member nuclear family, correctly assembling and phasing three genomic intervals, including the highly polymorphic HLA-B gene.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > computational biology > algorithms
bioinformatics > computational biology
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes > de novo assembly
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > single nucleotide polymorphism > haplotype
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > mutations > mutagenesis
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > mutations
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > single nucleotide polymorphism
CSHL Authors:
Communities: CSHL labs > Levy lab
CSHL labs > Wigler lab
CSHL Cancer Center Program
CSHL Cancer Center Program > Cancer Genetics and Genomics Program
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 13 July 2022
Date Deposited: 14 Jul 2022 15:18
Last Modified: 09 Feb 2024 19:05
PMCID: PMC9561374
URI: https://repository.cshl.edu/id/eprint/40674

Actions (login required)

Administrator's edit/view item Administrator's edit/view item