Hybrid assembly with long and short reads improves discovery of gene family expansions

Miller, J. R., Zhou, P., Mudge, J., Gurtowski, J., Lee, H., Ramaraj, T., Walenz, B. P., Liu, J., Stupar, R. M., Denny, R., Song, L., Singh, N., Maron, L. G., McCouch, S. R., McCombie, W. R., Schatz, M. C., Tiffin, P., Young, N. D., Silverstein, K. A. T. (July 2017) Hybrid assembly with long and short reads improves discovery of gene family expansions. BMC Genomics, 18 (1). p. 541. ISSN 1471-2164

[thumbnail of Paper]
Preview
PDF (Paper)
McCombie and Schatz 2017.pdf - Published Version

Download (558kB) | Preview
URL: https://www.ncbi.nlm.nih.gov/pubmed/28724409
DOI: 10.1186/s12864-017-3927-8

Abstract

BACKGROUND: Long-read and short-read sequencing technologies offer competing advantages for eukaryotic genome sequencing projects. Combinations of both may be appropriate for surveys of within-species genomic variation. METHODS: We developed a hybrid assembly pipeline called "Alpaca" that can operate on 20X long-read coverage plus about 50X short-insert and 50X long-insert short-read coverage. To preclude collapse of tandem repeats, Alpaca relies on base-call-corrected long reads for contig formation. RESULTS: Compared to two other assembly protocols, Alpaca demonstrated the most reference agreement and repeat capture on the rice genome. On three accessions of the model legume Medicago truncatula, Alpaca generated the most agreement to a conspecific reference and predicted tandemly repeated genes absent from the other assemblies. CONCLUSION: Our results suggest Alpaca is a useful tool for investigating structural and copy number variation within de novo assemblies of sampled populations.

Item Type: Paper
Uncontrolled Keywords: Genome assembly Hybrid assembly pipeline Medicago truncatula Tandem repeats
Subjects: bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes > de novo assembly
Investigative techniques and equipment > assays > next generation sequencing
bioinformatics > genomics and proteomics > analysis and processing > reference assembly
CSHL Authors:
Communities: CSHL Cancer Center Program > Cancer Genetics
CSHL labs > McCombie lab
CSHL labs > Schatz lab
CSHL Cancer Center Program > Cancer Genetics and Genomics Program
Depositing User: Matt Covey
Date: 19 July 2017
Date Deposited: 21 Jul 2017 20:57
Last Modified: 05 Nov 2020 14:53
PMCID: PMC5518131
Related URLs:
URI: https://repository.cshl.edu/id/eprint/35071

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving