Defining the extent of gene function using ROC curvature

Fischer, Stephan, Gillis, Jesse (September 2021) Defining the extent of gene function using ROC curvature. BioRxiv. (Unpublished)

[thumbnail of 2021.Fischer.gene_function.pdf] PDF
2021.Fischer.gene_function.pdf
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (3MB)
URL: https://www.biorxiv.org/content/10.1101/2021.09.03...
DOI: 10.1101/2021.09.03.458825

Abstract

Machine learning in genomics plays a key role in leveraging high-throughput data, but assessing the generalizability of performance has been a persistent challenge. Here, we propose to evaluate the generalizability of gene characterizations through the shape of performance curves. We identify Functional Equivalence Classes (FECs), uniform subsets of annotated and unannotated genes that jointly drive performance, by assessing the presence of straight lines in ROC curves. FECs are widespread across modalities and methods, and can be used to evaluate the extent and contextspecificity of functional annotations in a data-driven manner. For example, FECs suggest that B cell markers can be decomposed into shared primary markers (10 to 50 genes), and tissue-specific secondary markers (100 to 500genes). In addition, FECs are compatible with a wide range of functional encodings, with marker sets spanning at most 5% of the genome and data-driven extensions of Gene Ontology sets spanning up to 40% of the genome. Simple to assess visually and statistically, the identification of FECs in performance curves paves the way for novel functional characterization and increased robustness in analysis

Item Type: Paper
Subjects: organs, tissues, organelles, cell types and functions > cell types and functions > cell types > B cells
organs, tissues, organelles, cell types and functions > cell types and functions > cell types > B cells
organs, tissues, organelles, cell types and functions > cell types and functions > cell types > B cells
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > genes, structure and function
bioinformatics > computational biology > algorithms > machine learning
CSHL Authors:
Communities: CSHL labs > Gillis Lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 5 September 2021
Date Deposited: 23 Sep 2021 15:17
Last Modified: 23 Sep 2021 15:17
URI: https://repository.cshl.edu/id/eprint/40361

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving