Selecting deep neural networks that yield consistent attribution-based interpretations for genomics

Majdandzic, Antonio, Rajesh, Chandana, Tang, Amber, Toneyan, Shushan, Labelson, Ethan, Tripathy, Rohit, Koo, Peter K (November 2022) Selecting deep neural networks that yield consistent attribution-based interpretations for genomics. Proc Mach Learn Res, 200. pp. 131-149. ISSN 2640-3498

[thumbnail of Selecting deep neural networks that yield consistent attribution-based interpretations for genomics.pdf] PDF
Selecting deep neural networks that yield consistent attribution-based interpretations for genomics.pdf - Published Version
Available under License Creative Commons Attribution.

Download (1MB)
URL: https://www.ncbi.nlm.nih.gov/pubmed/37205975

Abstract

Deep neural networks (DNNs) have advanced our ability to take DNA primary sequence as input and predict a myriad of molecular activities measured via high-throughput functional genomic assays. Post hoc attribution analysis has been employed to provide insights into the importance of features learned by DNNs, often revealing patterns such as sequence motifs. However, attribution maps typically harbor spurious importance scores to an extent that varies from model to model, even for DNNs whose predictions generalize well. Thus, the standard approach for model selection, which relies on performance of a held-out validation set, does not guarantee that a high-performing DNN will provide reliable explanations. Here we introduce two approaches that quantify the consistency of important features across a population of attribution maps; consistency reflects a qualitative property of human interpretable attribution maps. We employ the consistency metrics as part of a multivariate model selection framework to identify models that yield high generalization performance and interpretable attribution analysis. We demonstrate the efficacy of this approach across various DNNs quantitatively with synthetic data and qualitatively with chromatin accessibility data.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes > comparative genomics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > genomes
organs, tissues, organelles, cell types and functions > tissues types and functions > neural networks
CSHL Authors:
Communities: CSHL Cancer Center Program
CSHL Cancer Center Program > Gene Regulation and Inheritance Program
CSHL Cancer Center Shared Resources
CSHL labs > Koo Lab
School of Biological Sciences > Publications
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: November 2022
Date Deposited: 21 Sep 2023 19:23
Last Modified: 29 Feb 2024 18:17
PMCID: PMC10194041
URI: https://repository.cshl.edu/id/eprint/40956

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving