Data from "Assessing identity, redundancy and confounds in Gene Ontology annotations over time"

Gillis, J., Pavlidis, P. (2013) Data from "Assessing identity, redundancy and confounds in Gene Ontology annotations over time". [Dataset]

[thumbnail of HIPPIE PPIN - The protein interaction data used in sections 3.3 and 3.4. ] Plain Text (HIPPIE PPIN - The protein interaction data used in sections 3.3 and 3.4. )
HIPPIE_used_current_Jan30_2012.txt - Published Version
Available under License Creative Commons Attribution.

Download (1MB)
[thumbnail of frac_confound_GO_103.txt - Each GO group's confoundedness for our final data point for GO. These data are plotted in Figure 3A. "NaN" occurs where there was division by zero.] Plain Text (frac_confound_GO_103.txt - Each GO group's confoundedness for our final data point for GO. These data are plotted in Figure 3A. "NaN" occurs where there was division by zero.)
frac_confound_GO_103.txt - Published Version
Available under License Creative Commons Attribution.

Download (59kB)
[thumbnail of frac_confound_con_103.txt - Number of functions shared by gene pairs from the PPIN, and the number of functions confounded for our final data point for GO (edition 103). These data are plotted in Figure 3B.] Plain Text (frac_confound_con_103.txt - Number of functions shared by gene pairs from the PPIN, and the number of functions confounded for our final data point for GO (edition 103). These data are plotted in Figure 3B.)
frac_confound_con_103.txt - Published Version
Available under License Creative Commons Attribution.

Download (1MB)
[thumbnail of frac_confound_aved.txt - The connection-level data plotted in figure 5A.] Plain Text (frac_confound_aved.txt - The connection-level data plotted in figure 5A.)
frac_confound_con_aved.txt - Published Version
Available under License Creative Commons Attribution.

Download (501B)
[thumbnail of frac_confound_GO_aved.txt -The GO-term-level data plotted in figure 5A.] Plain Text (frac_confound_GO_aved.txt -The GO-term-level data plotted in figure 5A.)
frac_confound_GO_aved.txt - Published Version
Available under License Creative Commons Attribution.

Download (500B)
[thumbnail of Confound table List of GO IDs and Pubmed IDs of papers contributing the most confound edges for those functions] Plain Text (Confound table List of GO IDs and Pubmed IDs of papers contributing the most confound edges for those functions)
confound_table.txt - Published Version
Available under License Creative Commons Attribution.

Download (37kB)
[thumbnail of Semantic stability table List of genes and number of GO editions since they changed their functional identity (measured as the highest semantic similarity with itself)] Plain Text (Semantic stability table List of genes and number of GO editions since they changed their functional identity (measured as the highest semantic similarity with itself))
semantic_stability.txt - Published Version
Available under License Creative Commons Attribution.

Download (70kB)
[thumbnail of Semantic similarity table Similarity ranking for each gene back through each edition of GO. A value of "1" means the gene was "most similar to itself" or tied for first.] Plain Text (Semantic similarity table Similarity ranking for each gene back through each edition of GO. A value of "1" means the gene was "most similar to itself" or tied for first.)
semantic_similarity.txt - Published Version
Available under License Creative Commons Attribution.

Download (2MB)
[thumbnail of Multifunctionality rankings table List of gene multifunctionality rankings over time. Useful if there's interest to reduce the annotation bias in GO] Plain Text (Multifunctionality rankings table List of gene multifunctionality rankings over time. Useful if there's interest to reduce the annotation bias in GO)
multifunctionality_rankings.txt - Published Version

Download (2MB)

Abstract

The Gene Ontology (GO) is heavily used in systems biology but the potential for redundancy, confounds with other data sources and problems with stability over time have been little explored. We report that GO annotations are stable over short periods with 3% of genes not being most semantically similar to themselves between monthly GO editions. However, we find that genes can alter their "functional identity" over time, with 20% of genes not matching to themselves (by semantic similarity) after two years. We further find that annotation bias in GO, in which some genes are more characterized than others, has declined in yeast, but generally risen in humans. Finally, we discovered that many entries in protein interaction databases are due to the same published reports that are used for GO annotations with 66% of assessed GO groups exhibiting this confound. We provide a case study to illustrate how this information can be used in analyses of gene sets and networks. The following files for human genes are intended to assist researchers who wish to check their own data for the types of effects we report in the paper. The files are tab-delimited. Genes are referenced by NCBI IDs or official symbols, and publications by PubMed IDs.

Item Type: Dataset
Subjects: bioinformatics > genomics and proteomics > annotation
bioinformatics
bioinformatics > genomics and proteomics > annotation > gene expression profiling annotation
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > Mapping and Rendering > ontology
CSHL Authors:
Communities: CSHL labs > Gillis Lab
Depositing User: Matt Covey
Date: 2013
Date Deposited: 29 Apr 2013 14:06
Last Modified: 29 Apr 2013 14:06
Related URLs:
URI: https://repository.cshl.edu/id/eprint/28272

Actions (login required)

Administrator's edit/view item Administrator's edit/view item