Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences

Deng, Cecilia H, Naithani, Sushma, Kumari, Sunita, Cobo-Simón, Irene, Quezada-Rodríguez, Elsa H, Skrabisova, Maria, Gladman, Nick, Correll, Melanie J, Sikiru, Akeem Babatunde, Afuwape, Olusola O, Marrano, Annarita, Rebollo, Ines, Zhang, Wentao, Jung, Sook (December 2023) Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences. Database : the journal of biological databases and curation, 2023. ISSN 1758-0463

[thumbnail of 2023_Deng_Genotype_and_phenotype_data_standardization.pdf] PDF
2023_Deng_Genotype_and_phenotype_data_standardization.pdf - Published Version
Available under License Creative Commons Attribution.

Download (2MB)
DOI: 10.1093/database/baad088


Large-scale genotype and phenotype data have been increasingly generated to identify genetic markers, understand gene function and evolution and facilitate genomic selection. These datasets hold immense value for both current and future studies, as they are vital for crop breeding, yield improvement and overall agricultural sustainability. However, integrating these datasets from heterogeneous sources presents significant challenges and hinders their effective utilization. We established the Genotype-Phenotype Working Group in November 2021 as a part of the AgBioData Consortium ( to review current data types and resources that support archiving, analysis and visualization of genotype and phenotype data to understand the needs and challenges of the plant genomic research community. For 2021-22, we identified different types of datasets and examined metadata annotations related to experimental design/methods/sample collection, etc. Furthermore, we thoroughly reviewed publicly funded repositories for raw and processed data as well as secondary databases and knowledgebases that enable the integration of heterogeneous data in the context of the genome browser, pathway networks and tissue-specific gene expression. Based on our survey, we recommend a need for (i) additional infrastructural support for archiving many new data types, (ii) development of community standards for data annotation and formatting, (iii) resources for biocuration and (iv) analysis and visualization tools to connect genotype data with phenotype data to enhance knowledge synthesis and to foster translational research. Although this paper only covers the data and resources relevant to the plant research community, we expect that similar issues and needs are shared by researchers working on animals. Database URL:

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics
organism description > plant
CSHL Authors:
Communities: CSHL labs > Ware lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 11 December 2023
Date Deposited: 20 Dec 2023 19:01
Last Modified: 08 Jan 2024 19:02
PMCID: PMC10712715
Related URLs:

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving