Equitability, mutual information, and the maximal information coefficient

Kinney, J. B., Atwal, G. S. (2014) Equitability, mutual information, and the maximal information coefficient. Proceedings of the National Academy of Sciences of the United States of America, 111 (9). pp. 3354-9. ISSN 0027-8424

[img]
Preview
PDF (Paper)
Atwal and Kinney PNAS 2014c.pdf - Published Version

Download (1541Kb) | Preview
URL: http://www.ncbi.nlm.nih.gov/pubmed/24550517
DOI: 10.1073/pnas.1309933111

Abstract

How should one quantify the strength of association between two random variables without bias for relationships of a specific form? Despite its conceptual simplicity, this notion of statistical "equitability" has yet to receive a definitive mathematical formalization. Here we argue that equitability is properly formalized by a self-consistency condition closely related to Data Processing Inequality. Mutual information, a fundamental quantity in information theory, is shown to satisfy this equitability criterion. These findings are at odds with the recent work of Reshef et al. [Reshef DN, et al. (2011) Science 334(6062):1518-1524], which proposed an alternative definition of equitability and introduced a new statistic, the "maximal information coefficient" (MIC), said to satisfy equitability in contradistinction to mutual information. These conclusions, however, were supported only with limited simulation evidence, not with mathematical arguments. Upon revisiting these claims, we prove that the mathematical definition of equitability proposed by Reshef et al. cannot be satisfied by any (nontrivial) dependence measure. We also identify artifacts in the reported simulation evidence. When these artifacts are removed, estimates of mutual information are found to be more equitable than estimates of MIC. Mutual information is also observed to have consistently higher statistical power than MIC. We conclude that estimating mutual information provides a natural (and often practical) way to equitably quantify statistical associations in large datasets.

Item Type: Paper
Subjects: bioinformatics
bioinformatics > computational biology
bioinformatics > genomics and proteomics > databases > databases
bioinformatics > genomics and proteomics > datasets
CSHL Authors:
Communities: CSHL Cancer Center Program > Gene Regulation and Cell Proliferation
CSHL labs > Atwal lab
CSHL labs > Kinney lab
CSHL Cancer Center Program > Cancer Genetics
Depositing User: Matt Covey
Date Deposited: 07 Mar 2014 16:43
Last Modified: 14 Oct 2015 18:40
PMCID: PMC3948249
Related URLs:
URI: http://repository.cshl.edu/id/eprint/29568

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving