Cannon, Ethalinda KS, Molik, David C, Wright, Adam J, Zhang, Huiting, Honaas, Loren, Chougule, Kapeel, Dyer, Sarah (January 2025) Guidelines for Gene and Genome Assembly Nomenclature. Genetics. iyaf006. ISSN 0016-6731 (Public Dataset)
![]() |
PDF
10.1093.genetics.iyaf006.pdf - Published Version Available under License Creative Commons Attribution. Download (471kB) |
Abstract
The rapid increase in the number of reference-quality genome assemblies presents significant new opportunities for genomic research. However, the absence of standardized naming conventions for genome assemblies and annotations across datasets creates substantial challenges. Inconsistent naming hinders the identification of correct assemblies, complicates the integration of bioinformatics pipelines, and makes it difficult to link assemblies across multiple resources. To address this, we developed a specification for standardizing the naming of reference genome assemblies, to improve consistency across datasets and facilitate interoperability. This specification was created with FAIR (Findable, Accessible, Interoperable, and Reusable) practices in mind, ensuring that reference assemblies are easier to locate, access, and reuse across research communities. Additionally, it has been designed to comply with primary genomic data repositories, including members of the International Nucleotide Sequence Database Collaboration (INSDC) consortium, ensuring compatibility with widely used databases. While initially tailored to the agricultural genomics community, the specification is adaptable for use across different taxa. Widespread adoption of this standardized nomenclature would streamline assembly management, better enable cross-species analyses, and improve the reproducibility of research. It would also enhance natural language processing applications that depend on consistent reference assembly names in genomic literature, promoting greater integration and automated analysis of genomic data. This is a good time to consider more consistent genomic data nomenclature as many research communities and data resources are now finding themselves juggling multiple datasets from multiple data providers.
Item Type: | Paper |
---|---|
Subjects: | bioinformatics bioinformatics > genomics and proteomics > genetics & nucleic acid processing bioinformatics > genomics and proteomics bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes |
CSHL Authors: | |
Communities: | CSHL labs > Ware lab |
SWORD Depositor: | CSHL Elements |
Depositing User: | CSHL Elements |
Date: | 15 January 2025 |
Date Deposited: | 17 Mar 2025 15:27 |
Last Modified: | 17 Mar 2025 15:27 |
Related URLs: | |
Dataset ID: |
|
URI: | https://repository.cshl.edu/id/eprint/41822 |
Actions (login required)
![]() |
Administrator's edit/view item |