Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel

Delaneau, O., Marchini, J., McVeanh, G. A., Donnelly, P., Lunter, G., Marchini, J. L., Myers, S., Gupta-Hinch, A., Iqbal, Z., Mathieson, I., Rimmer, A., Xifara, D. K., Kerasidou, A., Churchhouse, C., Altshuler, D. M., Gabriel, S. B., Lander, E. S., Gupta, N., Daly, M. J., DePristo, M. A., Banks, E., Bhatia, G., Carneiro, M. O., Del Angel, G., Genovese, G., Handsaker, R. E., Hartl, C., McCarroll, S. A., Nemesh, J. C., Poplin, R. E., Schaffner, S. F., Shakir, K., Sabeti, P. C., Grossman, S. R., Tabrizi, S., Tariyal, R., Li, H., Reich, D., Durbin, R. M., Hurles, M. E., Balasubramaniam, S., Burton, J., Danecek, P., Keane, T. M., Kolb-Kokocinski, A., McCarthy, S., Stalker, J., Quail, M., Ayub, Q., Chen, Y., Coffey, A. J., Colonna, V., Huang, N., Jostins, L., Scally, A., Walter, K., Xue, Y., Zhang, Y., Blackburne, B., Lindsay, S. J., Ning, Z., Frankish, A., Harrow, J., Chris, T. S., Abecasis, G. R., Kang, H. M., Anderson, P., Blackwell, T., Busonero, F., Fuchsberger, C., Jun, G., Maschio, A., Porcu, E., Sidore, C., Tan, A., Trost, M. K., Bentley, D. R., Grocock, R., Humphray, S., James, T., Kingsbury, Z., Bauer, M., Cheetham, R. K., Cox, T., Eberle, M., Murray, L., Shaw, R., Chakravarti, A., Clark, A. G., Keinan, A., Rodriguez-Flores, J. L., De LaVega, F. M., Degenhardt, J., Eichler, E. E., Flicek, P., Clarke, L., Leinonen, R., Smith, R. E., Zheng-Bradley, X., Beal, K. (June 2014) Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel. Nature Communications, 5. p. 3934. ISSN 20411723

Preview

PDF
Yoon_NatComm_2014.pdf - Published Version
Download (282kB) | Preview

URL: https://www.ncbi.nlm.nih.gov/pubmed/25653097

DOI: 10.1038/ncomms4934

Abstract

A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. © 2014 Macmillan Publishers Limited. All rights reserved.

Item Type:	Paper
Subjects:	bioinformatics > genomics and proteomics > Mapping and Rendering > Micro Array Data Rendering Investigative techniques and equipment > assays > whole genome sequencing
CSHL Authors:	Yoon, Seungtai Lihm, Jayon
Communities:	CSHL labs > McCombie lab CSHL labs > Yoon lab
Depositing User:	Matt Covey
Date:	13 June 2014
Date Deposited:	11 Jul 2014 16:49
Last Modified:	10 Sep 2019 16:11
PMCID:	PMC4338501
Related URLs:	Publisher
URI:	https://repository.cshl.edu/id/eprint/30486

Actions (login required)

Administrator's edit/view item