Link prediction for annotation graphs using graph summarization

Thor, A., Anderson, P., Raschid, L., Navlakha, S., Saha, B., Khuller, S., Zhang, X. N. (October 2011) Link prediction for annotation graphs using graph summarization. Lecture Notes in Computer Science, 7031 L (Part 1). Springer, pp. 714-729. ISBN 03029743 (ISSN); 9783642250729 (ISBN)

URL: https://www.scopus.com/inward/record.uri?eid=2-s2....
DOI: 10.1007/978-3-642-25073-6_45

Abstract

Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences where genes or proteins are annotated with controlled vocabulary terms (CV terms) from ontologies. The W3C Linking Open Data (LOD) initiative and semantic Web technologies are playing a leading role in making such datasets widely available. Scientists can mine these datasets to discover patterns of annotation. While ontology alignment and integration across datasets has been explored in the context of the semantic Web, there is no current approach to mine such patterns in annotation graph datasets. In this paper, we propose a novel approach for link prediction; it is a preliminary task when discovering more complex patterns. Our prediction is based on a complementary methodology of graph summarization (GS) and dense subgraphs (DSG). GS can exploit and summarize knowledge captured within the ontologies and in the annotation patterns. DSG uses the ontology structure, in particular the distance between CV terms, to filter the graph, and to find promising subgraphs. We develop a scoring function based on multiple heuristics to rank the predictions. We perform an extensive evaluation on Arabidopsis thaliana genes. © 2011 Springer-Verlag.

Item Type: Book
Additional Information: Conference
Subjects: organism description > plant > Arabidopsis
bioinformatics > computational biology > algorithms
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > genes, structure and function > gene expression
bioinformatics > genomics and proteomics > Mapping and Rendering > ontology
CSHL Authors:
Communities: CSHL labs > Navlakha lab
Depositing User: Matthew Dunn
Date: October 2011
Date Deposited: 08 Nov 2019 19:49
Last Modified: 08 Nov 2019 19:49
Related URLs:
URI: https://repository.cshl.edu/id/eprint/38686

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving