Tang, Ziqi, Koo, Peter K (March 2024) Evaluating the representational power of pre-trained DNA language models for regulatory genomics. bioRxiv. (Public Dataset) (Submitted)
Preview |
PDF
2024.02.29.582810v1.full.pdf - Submitted Version Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (7MB) | Preview |
Abstract
The emergence of genomic language models (gLMs) offers an unsupervised approach to learn a wide diversity of cis-regulatory patterns in the non-coding genome without requiring labels of functional activity generated by wet-lab experiments. Previous evaluations have shown pre-trained gLMs can be leveraged to improve prediction performance across a broad range of regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since the gLMs in these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody a foundational understanding of cis-regulatory biology remains an open question. Here we evaluate the representational power of pre-trained gLMs to predict and interpret cell-type-specific functional genomics data that span DNA and RNA regulation. Our findings suggest that current gLMs do not offer substantial advantages over conventional machine learning approaches that use one-hot encoded sequences. This work highlights a major limitation with current gLMs, raising potential issues in conventional pre-training strategies for the non-coding genome.
Item Type: | Paper |
---|---|
Subjects: | bioinformatics bioinformatics > quantitative biology bioinformatics > quantitative biology > quantitative genetics |
CSHL Authors: | |
Communities: | CSHL labs > Koo Lab School of Biological Sciences > Publications |
SWORD Depositor: | CSHL Elements |
Depositing User: | CSHL Elements |
Date: | 4 March 2024 |
Date Deposited: | 07 Mar 2024 15:12 |
Last Modified: | 07 Mar 2024 15:12 |
Related URLs: | |
Dataset ID: |
|
URI: | https://repository.cshl.edu/id/eprint/41454 |
Actions (login required)
Administrator's edit/view item |