From superposition to sparse codes: interpretable representations in neural networks

Klindt, David, O'Neill, Charles, Reizinger, Patrik, Maurer, Harald, Miolane, Nina (March 2025) From superposition to sparse codes: interpretable representations in neural networks. arXiv. ISSN 2331-8422 (Submitted)

[thumbnail of 10.48550.arXiv.2503.01824.pdf] PDF
10.48550.arXiv.2503.01824.pdf - Submitted Version
Available under License Creative Commons Attribution.

Download (8MB)

Abstract

Understanding how information is represented in neural networks is a fundamental challenge in both neuroscience and artificial intelligence. Despite their nonlinear architectures, recent evidence suggests that neural networks encode features in superposition, meaning that input concepts are linearly overlaid within the network's representations. We present a perspective that explains this phenomenon and provides a foundation for extracting interpretable representations from neural activations. Our theoretical framework consists of three steps: (1) Identifiability theory shows that neural networks trained for classification recover latent features up to a linear transformation. (2) Sparse coding methods can extract disentangled features from these representations by leveraging principles from compressed sensing. (3) Quantitative interpretability metrics provide a means to assess the success of these methods, ensuring that extracted features align with human-interpretable concepts. By bridging insights from theoretical neuroscience, representation learning, and interpretability research, we propose an emerging perspective on understanding neural representations in both artificial and biological systems. Our arguments have implications for neural coding theories, AI transparency, and the broader goal of making deep learning models more interpretable.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > computational biology > algorithms
bioinformatics > computational biology
CSHL Authors:
Communities: CSHL labs > Klindt lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 3 March 2025
Date Deposited: 04 Mar 2025 20:43
Last Modified: 04 Mar 2025 20:43
URI: https://repository.cshl.edu/id/eprint/41806

Actions (login required)

Administrator's edit/view item Administrator's edit/view item