Data mining with iPlant: A meeting report from the 2013 GARNet workshop, Data mining with iPlant

Martin, L., Cook, C., Matasci, N., Williams, J., Bastow, R. (2015) Data mining with iPlant: A meeting report from the 2013 GARNet workshop, Data mining with iPlant. Journal of Experimental Botany, 66 (1). pp. 1-6. ISSN 0022-0957

Abstract

High-throughput sequencing technologies have rapidly moved from large international sequencing centres to individual laboratory benchtops. These changes have driven the 'data deluge' of modern biology. Submissions of nucleotide sequences to GenBank, for example, have doubled in size every year since 1982, and individual data sets now frequently reach terabytes in size. While 'big data' present exciting opportunities for scientific discovery, data analysis skills are not part of the typical wet bench biologist's experience. Knowing what to do with data, how to visualize and analyse them, make predictions, and test hypotheses are important barriers to success. Many researchers also lack adequate capacity to store and share these data, creating further bottlenecks to effective collaboration between groups and institutes. The US National Science Foundation-funded iPlant Collaborative was established in 2008 to form part of the data collection and analysis pipeline and help alleviate the bottlenecks associated with the big data challenge in plant science. Leveraging the power of high-performance computing facilities, iPlant provides free-to-use cyberinfrastructure to enable terabytes of data storage, improve analysis, and facilitate collaborations. To help train UK plant science researchers to use the iPlant platform and understand how it can be exploited to further research, GARNet organized a four-day Data mining with iPlant workshop at Warwick University in September 2013. This report provides an overview of the workshop, and highlights the power of the iPlant environment for lowering barriers to using complex bioinformatics resources, furthering discoveries in plant science research and providing a platform for education and outreach programmes.

Item Type: Paper
Subjects: bioinformatics > genomics and proteomics > databases > database optimization
bioinformatics > genomics and proteomics > databases > database search and retrieval
bioinformatics > genomics and proteomics > datasets
Investigative techniques and equipment > assays > next generation sequencing
CSHL Authors:
Communities: Dolan DNA Learning Center
Depositing User: Matt Covey
Date: 2015
Date Deposited: 24 Oct 2014 16:55
Last Modified: 24 Apr 2015 19:11
Related URLs:
URI: https://repository.cshl.edu/id/eprint/30868

Actions (login required)

Administrator's edit/view item Administrator's edit/view item