Sparse, random sampling is sufficient for central tolerance

Meyer, Hannah V, Dasgupta, Sanjoy, Banerjee, Amitava, Lin, Yong, Prabakar, Rishvanth K, Chapin, Sarah R, Kingsford, Carl, Navlakha, Saket (December 2025) Sparse, random sampling is sufficient for central tolerance. bioRxiv. ISSN 2692-8205 (Submitted)

[thumbnail of 10.64898.2025.12.09.693230.pdf] PDF
10.64898.2025.12.09.693230.pdf - Submitted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (8MB)

Abstract

Negative selection in the thymus limits autoimmunity by eliminating T cells that react strongly to self. Individual T cells, however, are only exposed to a small fraction of all self peptides during their “training” in the thymus, and it is puzzling how tolerance can be generalized to the remaining “test” self peptides across peripheral tissues in the body. Using a machine learning perspective, we show that such generalization is possible because the immune system satisfies two conditions: first that peptide abundance levels in the human thymus and periphery are highly correlated (i.e., training distribution ≈ test distribution), and second that cross-reactivity allows T cells to effectively learn binding information of similar peptides without explicitly interacting with all of them. Together, we show that sparse, random sampling of only 10% of self peptides in the thymus is sufficient to avoid reactivity to 90% of peripheral self, and we support this result with diverse experimental data. We then validate two predictions by our model; the first is that only 200–250 antigen presenting cells need to be seen by a T cell to ensure its robust selection, and the second relates how peptides missing from the thymus can drive auto-immunity of peripheral tissues. Overall, we provide a plausible answer to a long-standing question underlying adaptive immunity, and we highlight how generalization, a fundamental challenge faced by nearly every learning algorithm, is uniquely tackled by the immune system.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > quantitative biology
bioinformatics > computational biology
CSHL Authors:
Communities: CSHL labs > Meyer Lab
CSHL labs > Navlakha lab
CSHL Post Doctoral Fellows
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 12 December 2025
Date Deposited: 24 Apr 2026 15:12
Last Modified: 24 Apr 2026 15:12
Related URLs:
URI: https://repository.cshl.edu/id/eprint/42176

Actions (login required)

Administrator's edit/view item Administrator's edit/view item