Comprehensive data infrastructure for plant bioinformatics

Jordan, C., Stanzione, D., Ware, D. H., Lu, J., Noutsos, C. (2010) Comprehensive data infrastructure for plant bioinformatics. 2010 IEEE International Conference on Cluster Computing Workshops and Posters.

[thumbnail of Comprehensive data infrastructure for plant bioinformatics]
Preview
PDF (Comprehensive data infrastructure for plant bioinformatics)
Comprehensive_Data_Infrastructure_for_Plant_Bioinformatics.pdf

Download (170kB)

Abstract

The iPlant Collaborative is a 5-year, National Science Foundation-funded effort to develop cyberinfrastructure to address a series of grand challenges in plant science. The second of these grand challenges is the Genotype-to- Phenotype project, which seeks to provide tools, in the form of a web-based Discovery Environment, for understanding the developmental process from DNA to a full-grown plant. Addressing this challenge requires the integration of multiple data types that may be stored in multiple formats, with varying levels of standardization. Providing for reproducibility requires that detailed information documenting the experimental provenance of data, and the computational transformations applied to data once it is brought into the iPlant environment. Handling the large quantities of data involved in high-throughput sequencing and other experimental sources of bioinformatics data requires a robust infrastructure for storing and reusing large data objects. We describe the currently planned workflows to be developed for the Genotype-to-Phenotype discovery environment, the data types and formats that must be imported and manipulated within the environment, and we describe the data model that has been developed to express and exchange data within the Discovery Environment, along with the provenance model defined for capturing experimental source and digital transformation descriptions. Capabilities for interaction with reference databases are addressed, focusing not just on the ability to retrieve data from such data sources, but on the ability to use the iPlant Discovery Environment to further populate these important resources. Future activities and the challenges they will present to the data infrastructure of the iPlant Collaborative are also described. © 2010 IEEE.

Item Type: Paper
Uncontrolled Keywords: Bioinformatics Component Data Gateways Metadata Provenance Standards Bioinformatics data Cyber infrastructures Data infrastructure Data models Data source Data type Digital transformation Grand Challenge High-throughput Large data Multiple data types National Science Foundations Plant science Reference database Reproducibilities Work-flows Cluster computing
Subjects: bioinformatics > genomics and proteomics > databases > database construction
bioinformatics > genomics and proteomics > databases > database optimization
Publication Type > Meeting Abstract
organism description > plant
CSHL Authors:
Communities: CSHL Post Doctoral Fellows
CSHL labs > Ware lab
Depositing User: CSHL Librarian
Date: 2010
Date Deposited: 30 Sep 2011 20:38
Last Modified: 07 Mar 2018 18:28
URI: https://repository.cshl.edu/id/eprint/15442

Actions (login required)

Administrator's edit/view item Administrator's edit/view item