Building a FAIR data ecosystem for incorporating single-cell transcriptomics data into agricultural genome to phenome research.

Kapoor, Muskan, Ventura, Enrique Sapena, Walsh, Amy, Sokolov, Alexey, George, Nancy, Kumari, Sunita, Provart, Nicholas J, Cole, Benjamin, Libault, Marc, Tickle, Timothy, Warren, Wesley C, Koltes, James E, Papatheodorou, Irene, Ware, Doreen, Harrison, Peter W, Elsik, Christine, Yordanova, Galabina, Burdett, Tony, Tuggle, Christopher K (November 2024) Building a FAIR data ecosystem for incorporating single-cell transcriptomics data into agricultural genome to phenome research. Frontiers in Genetics, 15. p. 1460351. ISSN 1664-8021

[thumbnail of 10.3389.fgene.2024.1460351.pdf] PDF
10.3389.fgene.2024.1460351.pdf - Published Version
Available under License Creative Commons Attribution.

Download (2MB)

Abstract

INTRODUCTION: The agriculture genomics community has numerous data submission standards available, but the standards for describing and storing single-cell (SC, e.g., scRNA- seq) data are comparatively underdeveloped. METHODS: To bridge this gap, we leveraged recent advancements in human genomics infrastructure, such as the integration of the Human Cell Atlas Data Portal with Terra, a secure, scalable, open-source platform for biomedical researchers to access data, run analysis tools, and collaborate. In parallel, the Single Cell Expression Atlas at EMBL-EBI offers a comprehensive data ingestion portal for high-throughput sequencing datasets, including plants, protists, and animals (including humans). Developing data tools connecting these resources would offer significant advantages to the agricultural genomics community. The FAANG data portal at EMBL-EBI emphasizes delivering rich metadata and highly accurate and reliable annotation of farmed animals but is not computationally linked to either of these resources. RESULTS: Herein, we describe a pilot-scale project that determines whether the current FAANG metadata standards for livestock can be used to ingest scRNA-seq datasets into Terra in a manner consistent with HCA Data Portal standards. Importantly, rich scRNA-seq metadata can now be brokered through the FAANG data portal using a semi-automated process, thereby avoiding the need for substantial expert curation. We have further extended the functionality of this tool so that validated and ingested SC files within the HCA Data Portal are transferred to Terra for further analysis. In addition, we verified data ingestion into Terra, hosted on Azure, and demonstrated the use of a workflow to analyze the first ingested porcine scRNA-seq dataset. Additionally, we have also developed prototype tools to visualize the output of scRNA-seq analyses on genome browsers to compare gene expression patterns across tissues and cell populations. This JBrowse tool now features distinct tracks, showcasing PBMC scRNA-seq alongside two bulk RNA-seq experiments. DISCUSSION: We intend to further build upon these existing tools to construct a scientist-friendly data resource and analytical ecosystem based on Findable, Accessible, Interoperable, and Reusable (FAIR) SC principles to facilitate SC-level genomic analysis through data ingestion, storage, retrieval, re-use, visualization, and comparative annotation across agricultural species.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
CSHL Authors:
Communities: CSHL labs > Ware lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 28 November 2024
Date Deposited: 23 Dec 2024 13:35
Last Modified: 23 Dec 2024 13:35
PMCID: PMC11638175
Related URLs:
URI: https://repository.cshl.edu/id/eprint/41764

Actions (login required)

Administrator's edit/view item Administrator's edit/view item