Fischer, Stephan, Gillis, Jesse (October 2022) Defining the extent of gene function using ROC curvature. Bioinformatics. btac692. ISSN 1367-4803
PDF
2022-Gillis-Defining-the-extent-of-gene-function-using-ROC-curvature.pdf Available under License Creative Commons Attribution. Download (1MB) |
Abstract
MOTIVATION: Interactions between proteins help us understand how genes are functionally related and how they contribute to phenotypes. Experiments provide imperfect "ground truth" information about a small subset of potential interactions in a specific biological context, which can then be extended to the whole genome across different contexts, such as conditions, tissues, or species, through machine learning methods. However, evaluating the performance of these methods remains a critical challenge. Here, we propose to evaluate the generalizability of gene characterizations through the shape of performance curves. RESULTS: We identify Functional Equivalence Classes (FECs), subsets of annotated and unannotated genes that jointly drive performance, by assessing the presence of straight lines in ROC curves built from gene-centric prediction tasks, such as function or interaction predictions. FECs are widespread across data types and methods, they can be used to evaluate the extent and context-specificity of functional annotations in a data-driven manner. For example, FECs suggest that B cell markers can be decomposed into shared primary markers (10 to 50 genes), and tissue-specific secondary markers (100 to 500 genes). In addition, FECs suggest the existence of functional modules that span a wide range of the genome, with marker sets spanning at most 5% of the genome and data-driven extensions of Gene Ontology sets spanning up to 40% of the genome. Simple to assess visually and statistically, the identification of FECs in performance curves paves the way for novel functional characterization and increased robustness in the definition of functional gene sets. AVAILABILITY: Code for analyses and figures is available at https://github.com/yexilein/pyroc. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Item Type: | Paper |
---|---|
Subjects: | bioinformatics bioinformatics > genomics and proteomics bioinformatics > genomics and proteomics > Mapping and Rendering bioinformatics > computational biology > algorithms bioinformatics > computational biology bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > genes, structure and function bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > genes, structure and function > genes: types bioinformatics > computational biology > algorithms > machine learning bioinformatics > genomics and proteomics > Mapping and Rendering > ontology |
CSHL Authors: | |
Communities: | CSHL labs > Gillis Lab |
SWORD Depositor: | CSHL Elements |
Depositing User: | CSHL Elements |
Date: | 22 October 2022 |
Date Deposited: | 13 Dec 2022 16:27 |
Last Modified: | 11 Jan 2024 20:03 |
PMCID: | PMC9750128 |
URI: | https://repository.cshl.edu/id/eprint/40771 |
Actions (login required)
Administrator's edit/view item |