Pattern discovery and cancer gene identification in integrated cancer genomic data

Mo, Q., Wang, S., Seshan, V. E., Olshen, A. B., Schultz, N., Sander, C., Powers, R. S., Ladanyi, M., Shen, R. (March 2013) Pattern discovery and cancer gene identification in integrated cancer genomic data. Proceedings of the National Academy of Sciences of the United States of America, 110 (11). pp. 4245-4250. ISSN 00278424 (ISSN)

Preview

PDF (Paper)
Powers PNAS 2013.pdf - Published Version
Download (1MB) | Preview

URL: http://www.ncbi.nlm.nih.gov/pubmed/23431203

DOI: 10.1073/pnas.1208949110

Abstract

Large-scale integrated cancer genome characterization efforts including the cancer genome atlas and the cancer cell line encyclopedia have created unprecedented opportunities to study cancer biology in the context of knowing the entire catalog of genetic alterations. A clinically important challenge is to discover cancer subtypes and their molecular drivers in a comprehensive genetic context. Curtis et al. [Nature (2012) 486(7403):346-352] has recently shown that integrative clustering of copy number and gene expression in 2,000 breast tumors reveals novel subgroups beyond the classic expression subtypes that show distinct clinical outcomes. To extend the scope of integrative analysis for the inclusion of somatic mutation data by massively parallel sequencing, we propose a framework for joint modeling of discrete and continuous variables that arise from integrated genomic, epigenomic, and transcriptomic profiling. The core idea is motivated by the hypothesis that diverse molecular phenotypes can be predicted by a set of orthogonal latent variables that represent distinct molecular drivers, and thus can reveal tumor subgroups of biological and clinical importance. Using the cancer cell line encyclopedia dataset, we demonstrate our method can accurately group cell lines by their cell-of-origin for several cancer types, and precisely pinpoint their known and potential cancer driver genes. Our integrative analysis also demonstrates the power for revealing subgroups that are not lineage-dependent, but consist of different cancer types driven by a common genetic alteration. Application of the cancer genome atlas colorectal cancer data reveals distinct integrated tumor subtypes, suggesting different genetic pathways in colon cancer progression.

Item Type:	Paper
Uncontrolled Keywords:	Multidimensional data Multivariate generalized linear model Penalized regression
Subjects:	bioinformatics diseases & disorders > cancer bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification diseases & disorders bioinformatics > genomics and proteomics > genetics & nucleic acid processing bioinformatics > genomics and proteomics bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > genes, structure and function bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
CSHL Authors:	Powers, Scott
Communities:	CSHL labs > Powers lab CSHL Cancer Center Program > Cancer Genetics
Depositing User:	Matt Covey
Date:	12 March 2013
Date Deposited:	29 Mar 2013 13:13
Last Modified:	22 Dec 2017 17:02
PMCID:	PMC3600490
Related URLs:	Publisher
URI:	https://repository.cshl.edu/id/eprint/28060

Actions (login required)

Administrator's edit/view item