Bias tradeoffs in the creation and analysis of protein-protein interaction networks

Gillis, J., Ballouz, S., Pavlidis, P. (January 2014) Bias tradeoffs in the creation and analysis of protein-protein interaction networks. Journal of Proteomics, 100. pp. 44-54.

Abstract

Networks constructed from aggregated protein-protein interaction data are commonplace in biology. But the studies these data are derived from were conducted with their own hypotheses and foci. Focusing on data from budding yeast present in BioGRID, we determine that many of the downstream signals present in network data are significantly impacted by biases in the original data. We determine the degree to which selection bias in favor of biologically interesting bait proteins goes down with study size, while we also find that promiscuity in prey contributes more substantially in larger studies. We analyze interaction studies over time with respect to data in the Gene Ontology and find that reproducibly observed interactions are less likely to favor multifunctional proteins. We find that strong alignment between co-expression and protein-protein interaction data occurs only for extreme co-expression values, and use this data to suggest candidates for targets likely to reveal novel biology in follow-up studies. BIOLOGICAL SIGNIFICANCE: Protein-protein interaction data finds particularly heavy use in the interpretation of disease-causal variants. In principle, network data allows researchers to find novel commonalities among candidate genes. In this study, we detail several of the most salient biases contributing to aggregated protein-protein interaction databases. We find strong evidence for the role of selection and laboratory biases. Many of these effects contribute to the commonalities researchers find for disease genes. In order for characterization of disease genes and their interactions to not simply be an artifact of researcher preference, it is imperative to identify data biases explicitly. Based on this, we also suggest ways to move forward in producing candidates less influenced by prior knowledge. This article is part of a Special Issue entitled: SI: CNPN 2013.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics > design > protein network design
bioinformatics > computational biology
CSHL Authors:
Communities: CSHL labs > Gillis Lab
Stanley Institute for Cognitive Genomics
Depositing User: Matt Covey
Date: 27 January 2014
Date Deposited: 07 Feb 2014 21:44
Last Modified: 06 Nov 2015 20:05
PMCID: PMC3972268
Related URLs:
URI: https://repository.cshl.edu/id/eprint/29493

Actions (login required)

Administrator's edit/view item Administrator's edit/view item