GAME: Genomic API for Model Evaluation

Luthra, Ishika, Priyadarshi, Satyam, Guo, Rui, Mahieu, Lukas, Kempynck, Niklas, Dooley, Damion, Penzar, Dmitry, Vorontsov, Ilya, Sheng, Yilun, Tu, Xinming, Klie, Adam, Drusinsky, Shiron, Floren, Alexander, Armand, Ethan, Alasoo, Kaur, Seelig, Georg, Tewhey, Ryan, Koo, Peter, Agarwal, Vikram, Gosai, Sager, Pinello, Luca, White, Michael A, Lal, Avantika, Zeitlinger, Julia, Pollard, Katherine S, Libbrecht, Maxwell, Carter, Hannah, Mostafavi, Sara, Kulakovskiy, Ivan, Hsiao, Will, Aerts, Stein, Zhou, Jian, de Boer, Carl G (July 2025) GAME: Genomic API for Model Evaluation. bioRxiv. ISSN 2692-8205 (Submitted)

[thumbnail of 10.1101.2025.07.04.663250.pdf]

PDF
10.1101.2025.07.04.663250.pdf - Submitted Version
Available under License Creative Commons Attribution.
Download (544kB)

URL: https://www.ncbi.nlm.nih.gov/pubmed/40672207

DOI: 10.1101/2025.07.04.663250

Abstract

The rapid expansion of genomics datasets and the application of machine learning has produced sequence-to-activity genomics models with ever-expanding capabilities. However, benchmarking these models on practical applications has been challenging because individual projects evaluate their models in ad hoc ways, and there is substantial heterogeneity of both model architectures and benchmarking tasks. To address this challenge, we have created GAME, a system for large-scale, community-led standardized model benchmarking on user-defined evaluation tasks. We borrow concepts from the Application Programming Interface (API) paradigm to allow for seamless communication between pre-trained models and benchmarking tasks, ensuring consistent evaluation protocols. Because all models and benchmarks are inherently compatible in this framework, the continual addition of new models and new benchmarks is easy. We also developed a Matcher module powered by a large language model (LLM) to automate ambiguous task alignment between benchmarks and models. Containerization of these modules enhances reproducibility and facilitates the deployment of models and benchmarks across computing platforms. By focusing on predicting underlying biochemical phenomena (e.g. gene expression, open chromatin, DNA binding), we ensure that tasks remain technology-independent. We provide examples of benchmarks and models implementing this framework, and anticipate that the community will contribute their own, leading to an ever-expanding and evolving set of models and evaluation tasks. This resource will accelerate genomics research by illuminating the best models for a given task, motivating novel functional genomic benchmarks, and providing a more nuanced understanding of model abilities.

Item Type:	Paper
Subjects:	bioinformatics bioinformatics > genomics and proteomics
CSHL Authors:	Koo, Peter K
Communities:	CSHL labs > Koo Lab
SWORD Depositor:	CSHL Elements
Depositing User:	CSHL Elements
Date:	8 July 2025
Date Deposited:	04 Aug 2025 11:57
Last Modified:	04 Aug 2025 11:57
PMCID:	PMC12265512
Related URLs:	Publisher
URI:	https://repository.cshl.edu/id/eprint/41923

Actions (login required)

Administrator's edit/view item