Sun, G., Krasnitz, A. (November 2014) Significant distinct branches of hierarchical trees: a framework for statistical analysis and applications to biological data. BMC Genomics, 15. p. 1000. ISSN 1471-2164
Preview |
PDF (Paper)
Krasnitz BMC Genomics 2014.pdf - Published Version Download (979kB) | Preview |
Abstract
BACKGROUND: One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity. RESULTS: We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. For each dataset there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques. CONCLUSIONS: Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: http://www.cran.r-project.org/web/packages/TBEST/index.html.
Item Type: | Paper |
---|---|
Subjects: | bioinformatics > computational biology bioinformatics > computational biology > statistical analysis |
CSHL Authors: | |
Communities: | CSHL labs > Krasnitz lab CSHL Cancer Center Program > Cancer Genetics |
Depositing User: | Matt Covey |
Date: | 19 November 2014 |
Date Deposited: | 06 Jan 2015 15:30 |
Last Modified: | 14 Oct 2015 20:46 |
PMCID: | PMC4253613 |
Related URLs: | |
URI: | https://repository.cshl.edu/id/eprint/31012 |
Actions (login required)
Administrator's edit/view item |