Significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data

Sun, G., Krasnitz, A. (November 2014) Significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data. BMC Genomics, 15. p. 1000. ISSN 1471-2164

[thumbnail of Paper]
Preview
PDF (Paper)
Krasnitz BMC Genomics 2014.pdf - Published Version

Download (979kB) | Preview
URL: http://www.ncbi.nlm.nih.gov/pubmed/25409689
DOI: 10.1186/1471-2164-15-1000

Abstract

BACKGROUND: One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity. RESULTS: We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. For each dataset there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques. CONCLUSIONS: Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: http://www.cran.r-project.org/web/packages/TBEST/index.html.

Item Type: Paper
Subjects: bioinformatics > computational biology
bioinformatics > computational biology > statistical analysis
CSHL Authors:
Communities: CSHL labs > Krasnitz lab
CSHL Cancer Center Program > Cancer Genetics
Depositing User: Matt Covey
Date: 19 November 2014
Date Deposited: 06 Jan 2015 15:30
Last Modified: 14 Oct 2015 20:46
PMCID: PMC4253613
Related URLs:
URI: https://repository.cshl.edu/id/eprint/31012

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving