A diploid assembly-based benchmark for variants in the major histocompatibility complex

Chin, C. S., Wagner, J., Zeng, Q., Garrison, E., Garg, S., Fungtammasan, A., Rautiainen, M., Aganezov, S., Kirsche, M., Zarate, S., Schatz, M. C., Xiao, C., Rowell, W. J., Markello, C., Farek, J., Sedlazeck, F. J., Bansal, V., Yoo, B., Miller, N., Zhou, X., Carroll, A., Barrio, A. M., Salit, M., Marschall, T., Dilthey, A. T., Zook, J. M. (September 2020) A diploid assembly-based benchmark for variants in the major histocompatibility complex. Nat Commun, 11 (1). p. 4794. ISSN 2041-1723

URL: https://pubmed.ncbi.nlm.nih.gov/32963235/
DOI: 10.1038/s41467-020-18564-9


Most human genomes are characterized by aligning individual reads to the reference genome, but accurate long reads and linked reads now enable us to construct accurate, phased de novo assemblies. We focus on a medically important, highly variable, 5 million base-pair (bp) region where diploid assembly is particularly useful - the Major Histocompatibility Complex (MHC). Here, we develop a human genome benchmark derived from a diploid assembly for the openly-consented Genome in a Bottle sample HG002. We assemble a single contig for each haplotype, align them to the reference, call phased small and structural variants, and define a small variant benchmark for the MHC, covering 94% of the MHC and 22368 variants smaller than 50 bp, 49% more variants than a mapping-based benchmark. This benchmark reliably identifies errors in mapping-based callsets, and enables performance assessment in regions with much denser, complex variation than regions covered by previous benchmarks.

Item Type: Paper
Additional Information: 2041-1723 Chin, Chen-Shan Wagner, Justin Zeng, Qiandong Garrison, Erik Garg, Shilpa Fungtammasan, Arkarachai Rautiainen, Mikko Aganezov, Sergey Orcid: 0000-0003-2458-8323 Kirsche, Melanie Orcid: 0000-0002-6631-4761 Zarate, Samantha Schatz, Michael C Xiao, Chunlin Orcid: 0000-0001-8702-4889 Rowell, William J Orcid: 0000-0002-7422-1194 Markello, Charles Farek, Jesse Sedlazeck, Fritz J Orcid: 0000-0001-6040-2691 Bansal, Vikas Yoo, Byunggil Orcid: 0000-0002-7912-1862 Miller, Neil Orcid: 0000-0002-6151-4780 Zhou, Xin Carroll, Andrew Barrio, Alvaro Martinez Orcid: 0000-0001-5064-2093 Salit, Marc Marschall, Tobias Orcid: 0000-0002-9376-1030 Dilthey, Alexander T Orcid: 0000-0002-6394-4581 Zook, Justin M Orcid: 0000-0003-2309-8402 Journal Article Research Support, N.I.H., Intramural Nat Commun. 2020 Sep 22;11(1):4794. doi: 10.1038/s41467-020-18564-9.
Uncontrolled Keywords: Benchmarking Cell Line *Diploidy Genetic Variation Genome, Human Haplotypes Humans Major Histocompatibility Complex/*genetics
CSHL Authors:
Communities: CSHL labs > Schatz lab
Depositing User: Matthew Dunn
Date: 22 September 2020
Date Deposited: 19 Apr 2021 19:07
Last Modified: 19 Apr 2021 19:07
PMCID: PMC7508831
Related URLs:
URI: https://repository.cshl.edu/id/eprint/39850

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving