Guidelines for Gene and Genome Assembly Nomenclature

Cannon, Ethalinda KS, Molik, David C, Wright, Adam J, Zhang, Huiting, Honaas, Loren, Chougule, Kapeel, Dyer, Sarah (January 2025) Guidelines for Gene and Genome Assembly Nomenclature. Genetics. iyaf006. ISSN 0016-6731 (Public Dataset)

[thumbnail of 10.1093.genetics.iyaf006.pdf] PDF
10.1093.genetics.iyaf006.pdf - Published Version
Available under License Creative Commons Attribution.

Download (471kB)

Abstract

The rapid increase in the number of reference-quality genome assemblies presents significant new opportunities for genomic research. However, the absence of standardized naming conventions for genome assemblies and annotations across datasets creates substantial challenges. Inconsistent naming hinders the identification of correct assemblies, complicates the integration of bioinformatics pipelines, and makes it difficult to link assemblies across multiple resources. To address this, we developed a specification for standardizing the naming of reference genome assemblies, to improve consistency across datasets and facilitate interoperability. This specification was created with FAIR (Findable, Accessible, Interoperable, and Reusable) practices in mind, ensuring that reference assemblies are easier to locate, access, and reuse across research communities. Additionally, it has been designed to comply with primary genomic data repositories, including members of the International Nucleotide Sequence Database Collaboration (INSDC) consortium, ensuring compatibility with widely used databases. While initially tailored to the agricultural genomics community, the specification is adaptable for use across different taxa. Widespread adoption of this standardized nomenclature would streamline assembly management, better enable cross-species analyses, and improve the reproducibility of research. It would also enhance natural language processing applications that depend on consistent reference assembly names in genomic literature, promoting greater integration and automated analysis of genomic data. This is a good time to consider more consistent genomic data nomenclature as many research communities and data resources are now finding themselves juggling multiple datasets from multiple data providers.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
CSHL Authors:
Communities: CSHL labs > Ware lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 15 January 2025
Date Deposited: 17 Mar 2025 15:27
Last Modified: 17 Mar 2025 15:27
Related URLs:
Dataset ID:
  • https://github.com/AgBioData/Genome-Assembly-and- Annotation-Nomenclature_WG
URI: https://repository.cshl.edu/id/eprint/41822

Actions (login required)

Administrator's edit/view item Administrator's edit/view item