Pattern discovery and cancer gene identification in integrated cancer genomic data

Mo, Q., Wang, S., Seshan, V. E., Olshen, A. B., Schultz, N., Sander, C., Powers, R. S., Ladanyi, M., Shen, R. (March 2013) Pattern discovery and cancer gene identification in integrated cancer genomic data. Proceedings of the National Academy of Sciences of the United States of America, 110 (11). pp. 4245-4250. ISSN 00278424 (ISSN)

[thumbnail of Paper]
Preview
PDF (Paper)
Powers PNAS 2013.pdf - Published Version

Download (1MB) | Preview

Abstract

Large-scale integrated cancer genome characterization efforts including the cancer genome atlas and the cancer cell line encyclopedia have created unprecedented opportunities to study cancer biology in the context of knowing the entire catalog of genetic alterations. A clinically important challenge is to discover cancer subtypes and their molecular drivers in a comprehensive genetic context. Curtis et al. [Nature (2012) 486(7403):346-352] has recently shown that integrative clustering of copy number and gene expression in 2,000 breast tumors reveals novel subgroups beyond the classic expression subtypes that show distinct clinical outcomes. To extend the scope of integrative analysis for the inclusion of somatic mutation data by massively parallel sequencing, we propose a framework for joint modeling of discrete and continuous variables that arise from integrated genomic, epigenomic, and transcriptomic profiling. The core idea is motivated by the hypothesis that diverse molecular phenotypes can be predicted by a set of orthogonal latent variables that represent distinct molecular drivers, and thus can reveal tumor subgroups of biological and clinical importance. Using the cancer cell line encyclopedia dataset, we demonstrate our method can accurately group cell lines by their cell-of-origin for several cancer types, and precisely pinpoint their known and potential cancer driver genes. Our integrative analysis also demonstrates the power for revealing subgroups that are not lineage-dependent, but consist of different cancer types driven by a common genetic alteration. Application of the cancer genome atlas colorectal cancer data reveals distinct integrated tumor subtypes, suggesting different genetic pathways in colon cancer progression.

Item Type: Paper
Uncontrolled Keywords: Multidimensional data Multivariate generalized linear model Penalized regression
Subjects: bioinformatics
diseases & disorders > cancer
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification
diseases & disorders
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > genes, structure and function
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
CSHL Authors:
Communities: CSHL labs > Powers lab
CSHL Cancer Center Program > Cancer Genetics
Depositing User: Matt Covey
Date: 12 March 2013
Date Deposited: 29 Mar 2013 13:13
Last Modified: 22 Dec 2017 17:02
PMCID: PMC3600490
Related URLs:
URI: https://repository.cshl.edu/id/eprint/28060

Actions (login required)

Administrator's edit/view item Administrator's edit/view item