Of truth and pathways: chasing bits of information through myriads of articles

Krauthammer, M., Kra, P., Iossifov, I., Gomez, S. M., Hripcsak, G., Hatzivassiloglou, V., Friedman, C., Rzhetsky, A. (2002) Of truth and pathways: chasing bits of information through myriads of articles. Bioinformatics, 18 Sup. S249-57. ISSN 1367-4803 (Print)1367-4803 (Linking)

URL: http://www.ncbi.nlm.nih.gov/pubmed/12169554
DOI: 10.1093/bioinformatics/18.suppl_1.S249


Knowledge on interactions between molecules in living cells is indispensable for theoretical analysis and practical applications in modern genomics and molecular biology. Building such networks relies on the assumption that the correct molecular interactions are known or can be identified by reading a few research articles. However, this assumption does not necessarily hold, as truth is rather an emerging property based on many potentially conflicting facts. This paper explores the processes of knowledge generation and publishing in the molecular biology literature using modelling and analysis of real molecular interaction data. The data analysed in this article were automatically extracted from 50000 research articles in molecular biology using a computer system called GeneWays containing a natural language processing module. The paper indicates that truthfulness of statements is associated in the minds of scientists with the relative importance (connectedness) of substances under study, revealing a potential selection bias in the reporting of research results. Aiming at understanding the statistical properties of the life cycle of biological facts reported in research articles, we formulate a stochastic model describing generation and propagation of knowledge about molecular interactions through scientific publications. We hope that in the future such a model can be useful for automatically producing consensus views of molecular interaction data.

Item Type: Paper
Uncontrolled Keywords: Algorithms Artificial Intelligence Database Management Systems Databases, Bibliographic Gene Expression Regulation/ physiology Information Storage and Retrieval/ methods Models, Biological Models, Statistical Natural Language Processing Periodicals as Topic Protein Interaction Mapping/ methods Reproducibility of Results Sensitivity and Specificity Signal Transduction/ physiology Software Vocabulary, Controlled
Subjects: bioinformatics
bioinformatics > genomics and proteomics > databases
bioinformatics > computational biology > algorithms
bioinformatics > computational biology
CSHL Authors:
Communities: CSHL labs > Iossifov lab
Depositing User: Matt Covey
Date: 2002
Date Deposited: 01 Apr 2015 20:36
Last Modified: 01 Apr 2015 20:36
Related URLs:
URI: https://repository.cshl.edu/id/eprint/31297

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving