Representation learning of genomic sequence motifs with convolutional neural networks.

Koo, Peter K, Eddy, Sean R (December 2019) Representation learning of genomic sequence motifs with convolutional neural networks. PLoS Computational Biology, 15 (12). e1007560. ISSN 1553-734X

[thumbnail of Representation learning of genomic sequence motifs with convolutional neural networks.pdf] PDF
Representation learning of genomic sequence motifs with convolutional neural networks.pdf

Download (2MB)
URL: https://www.ncbi.nlm.nih.gov/pubmed/31856220
DOI: 10.1371/journal.pcbi.1007560

Abstract

Although convolutional neural networks (CNNs) have been applied to a variety of computational genomics problems, there remains a large gap in our understanding of how they build representations of regulatory genomic sequences. Here we perform systematic experiments on synthetic sequences to reveal how CNN architecture, specifically convolutional filter size and max-pooling, influences the extent that sequence motif representations are learned by first layer filters. We find that CNNs designed to foster hierarchical representation learning of sequence motifs-assembling partial features into whole features in deeper layers-tend to learn distributed representations, i.e. partial motifs. On the other hand, CNNs that are designed to limit the ability to hierarchically build sequence motif representations in deeper layers tend to learn more interpretable localist representations, i.e. whole motifs. We then validate that this representation learning principle established from synthetic sequences generalizes to in vivo sequences.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > protein structure, function, modification
bioinformatics > computational biology
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > protein structure, function, modification > protein types
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > protein structure, function, modification > protein types > transcription factor
CSHL Authors:
Communities: CSHL labs > Koo Lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: December 2019
Date Deposited: 21 May 2021 20:53
Last Modified: 02 Feb 2024 16:24
PMCID: PMC6941814
Related URLs:
URI: https://repository.cshl.edu/id/eprint/40129

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving