Design patterns for efficient graph algorithms in MapReduce

Lin, J., Schatz, M. C. (2010) Design patterns for efficient graph algorithms in MapReduce. Proceedings of the Eighth Workshop on Mining and Learning with Graphs . pp. 78-85.

URL: https://dl.acm.org/citation.cfm?doid=1830252.18302...
DOI: 10.1145/1830252.1830263

Abstract

Graphs are analyzed in many important contexts, including ranking search results based on the hyperlink structure of the world wide web, module detection of proteinprotein interaction networks, and privacy analysis of social networks. Many graphs of interest are difficult to analyze because of their large size, often spanning millions of vertices and billions of edges. As such, researchers have increasingly turned to distributed solutions. In particular, MapReduce has emerged as an enabling technology for large-scale graph processing. However, existing best practices for MapReduce graph algorithms have significant shortcomings that limit performance, especially with respect to partitioning, serializing, and distributing the graph. In this paper, we present three design patterns that address these issues and can be used to accelerate a large class of graph algorithms based on message passing, exemplified by PageRank. Experiments show that the application of our design patterns reduces the running time of PageRank on a web graph with 1.4 billion edges by 69%. © 2010 ACM.

Item Type: Paper
Additional Information:
Uncontrolled Keywords: Best-practices Design Patterns Distributed solutions Enabling technologies Graph algorithms Graph processing Hyperlink structure Large class Large sizes Map-reduce PageRank Privacy analysis Protein-protein interaction networks Running time Search results Social Networks Web graphs Algorithms Design Electronic equipment manufacture Hypertext systems Message passing World Wide Web Graphic methods
Subjects: bioinformatics > genomics and proteomics > annotation > map annotation
bioinformatics > quantitative biology
bioinformatics > computational biology
bioinformatics > genomics and proteomics > computers > computer software
Investigative techniques and equipment > interface method
CSHL Authors:
Communities: CSHL labs > Schatz lab
Depositing User: CSHL Librarian
Date: 2010
Date Deposited: 16 Mar 2012 15:10
Last Modified: 08 Mar 2018 19:35
URI: http://repository.cshl.edu/id/eprint/25367

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving