The impact of multifunctional genes on guilt "by association "analysis

Gillis, J., Pavlidis, P. (2011) The impact of multifunctional genes on guilt "by association "analysis. PLoS One, 6 (2). ISSN 19326203 (ISSN)

[thumbnail of Paper]
Preview
PDF (Paper)
Gillis PLoS One 2011.pdf - Published Version

Download (723kB) | Preview

Abstract

Many previous studies have shown that by using variants of "guilt-by-association", gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the "associations" in the data (e.g., protein interaction partners) of a gene are necessary in establishing "guilt". In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies. © 2011 Gillis, Pavlidis.

Item Type: Paper
Uncontrolled Keywords: accuracy article calculation controlled study data analysis gene function gene interaction genetic algorithm genetic database information processing information retrieval prediction probability protein expression protein interaction algorithm Alzheimer disease animal autism biology evaluation genetic association genetics genomics human metabolism methodology mouse Parkinson disease physiology protein analysis protein binding Saccharomyces cerevisiae schizophrenia sequence analysis statistics validation study protein Algorithms Animals Autistic Disorder Computational Biology Databases, Genetic Genetic Association Studies Humans Mice Protein Interaction Mapping Proteins Sequence Analysis, Protein
Subjects: bioinformatics
bioinformatics > genomics and proteomics > computers
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > computers > computer software
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > genes, structure and function
CSHL Authors:
Communities: CSHL labs > Gillis Lab
Depositing User: Matt Covey
Date: 2011
Date Deposited: 04 Apr 2013 13:45
Last Modified: 04 Apr 2013 13:45
PMCID: PMC3041792
Related URLs:
URI: https://repository.cshl.edu/id/eprint/28080

Actions (login required)

Administrator's edit/view item Administrator's edit/view item