Positive and negative forms of replicability in gene network analysis

Verleyen, W., Ballouz, S., Gillis, J. (2016) Positive and negative forms of replicability in gene network analysis. Bioinformatics, 32 (7). pp. 1065-73. ISSN 1367-4811 (Electronic)1367-4803 (Linking)

URL: http://www.ncbi.nlm.nih.gov/pubmed/26668004
DOI: 10.1093/bioinformatics/btv734

Abstract

MOTIVATION: Gene networks have become a central tool in the analysis of genomic data but are widely regarded as hard to interpret. This has motivated a great deal of comparative evaluation and research into best practices. We explore the possibility that this may lead to overfitting in the field as a whole. RESULTS: We construct a model of 'research communities' sampling from real gene network data and machine learning methods to characterize performance trends. Our analysis reveals an important principle limiting the value of replication, namely that targeting it directly causes 'easy' or uninformative replication to dominate analyses. We find that when sampling across network data and algorithms with similar variability, the relationship between replicability and accuracy is positive (Spearman's correlation, rs ~ 0.33) but where no such constraint is imposed, the relationship becomes negative for a given gene function (rs ~ -0.13). We predict factors driving replicability in some prior analyses of gene networks and show that they are unconnected with the correctness of the original result, instead reflecting replicable biases. Without these biases, the original results also vanish replicably. We show these effects can occur quite far upstream in network data, and that there is a strong tendency within protein-protein interaction data for highly replicable interactions to be associated with poor quality-control. AVAILABILITY AND IMPLEMENTATION: Algorithms, network data, and a guide to the code available at: https://github.com/wimverleyen/AggregateGeneFunctionPrediction. CONTACT: jgillis@cshl.edu.

Item Type: Paper
Subjects: bioinformatics > genomics and proteomics
bioinformatics > computational biology
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > genes, structure and function > gene network
CSHL Authors:
Communities: CSHL labs > Gillis Lab
Depositing User: Matt Covey
Date Deposited: 23 Dec 2015 18:10
Last Modified: 09 May 2016 19:06
Related URLs:
URI: http://repository.cshl.edu/id/eprint/32198

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving