Tang, Ziqi, Koo, Peter K (March 2024) Evaluating the representational power of pre-trained DNA language models for regulatory genomics. bioRxiv. (Public Dataset) (Submitted)
Preview |
PDF
2024.02.29.582810v1.full.pdf - Submitted Version Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (7MB) | Preview |
Abstract
The emergence of genomic language models (gLMs) offers an unsupervised approach to learn a wide diversity of cis-regulatory patterns in the non-coding genome without requiring labels of functional activity generated by wet-lab experiments. Previous evaluations have shown pre-trained gLMs can be leveraged to improve prediction performance across a broad range of regulatory genomics tasks, albeit using relatively simple benchmark datasets and baseline models. Since the gLMs in these studies were tested upon fine-tuning their weights for each downstream task, determining whether gLM representations embody a foundational understanding of cis-regulatory biology remains an open question. Here we evaluate the representational power of pre-trained gLMs to predict and interpret cell-type-specific functional genomics data that span DNA and RNA regulation. Our findings suggest that current gLMs do not offer substantial advantages over conventional machine learning approaches that use one-hot encoded sequences. This work highlights a major limitation with current gLMs, raising potential issues in conventional pre-training strategies for the non-coding genome.
| Item Type: | Paper |
|---|---|
| Subjects: | bioinformatics bioinformatics > quantitative biology bioinformatics > quantitative biology > quantitative genetics |
| CSHL Authors: | |
| Communities: | CSHL labs > Koo Lab School of Biological Sciences > Publications |
| SWORD Depositor: | CSHL Elements |
| Depositing User: | CSHL Elements |
| Date: | 4 March 2024 |
| Date Deposited: | 07 Mar 2024 15:12 |
| Last Modified: | 07 Mar 2024 15:12 |
| Related URLs: | |
| Dataset ID: |
|
| URI: | https://repository.cshl.edu/id/eprint/41454 |
Actions (login required)
![]() |
Administrator's edit/view item |


