Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks

Koo, Peter K, Majdandzic, Antonio, Ploenzke, Matthew, Anand, Praveen, Paul, Steffan B (May 2021) Global importance analysis: An interpretability method to quantify importance of genomic features in deep neural networks. PLoS Computational Biology, 17 (5). e1008925. ISSN 1553-7358

[thumbnail of 2021_Koo_GlobalImportanceAnalysis.pdf] PDF
2021_Koo_GlobalImportanceAnalysis.pdf

Download (2MB)

Abstract

Deep neural networks have demonstrated improved performance at predicting the sequence specificities of DNA- and RNA-binding proteins compared to previous methods that rely on k-mers and position weight matrices. To gain insights into why a DNN makes a given prediction, model interpretability methods, such as attribution methods, can be employed to identify motif-like representations along a given sequence. Because explanations are given on an individual sequence basis and can vary substantially across sequences, deducing generalizable trends across the dataset and quantifying their effect size remains a challenge. Here we introduce global importance analysis (GIA), a model interpretability method that quantifies the population-level effect size that putative patterns have on model predictions. GIA provides an avenue to quantitatively test hypotheses of putative patterns and their interactions with other patterns, as well as map out specific functions the network has learned. As a case study, we demonstrate the utility of GIA on the computational task of predicting RNA-protein interactions from sequence. We first introduce a convolutional network, we call ResidualBind, and benchmark its performance against previous methods on RNAcompete data. Using GIA, we then demonstrate that in addition to sequence motifs, ResidualBind learns a model that considers the number of motifs, their spacing, and sequence context, such as RNA secondary structure and GC-bias.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > RNA expression
bioinformatics > computational biology > algorithms
bioinformatics > computational biology
bioinformatics > computational biology > algorithms > machine learning
CSHL Authors:
Communities: CSHL labs > Koo Lab
CSHL Cancer Center Program
CSHL Cancer Center Program > Cancer Genetics and Genomics Program
CSHL Cancer Center Program > Gene Regulation and Inheritance Program
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: May 2021
Date Deposited: 19 May 2021 18:47
Last Modified: 13 Feb 2024 18:37
PMCID: PMC8118286
URI: https://repository.cshl.edu/id/eprint/40092

Actions (login required)

Administrator's edit/view item Administrator's edit/view item