Shrinkage-based similarity metric for cluster analysis of microarray data

Cherepinsky, V., Feng, J. W., Rejali, M., Mishra, B. (August 2003) Shrinkage-based similarity metric for cluster analysis of microarray data. Proceedings of the National Academy of Sciences of the United States of America, 100 (17). pp. 9668-9673. ISSN 0027-8424

[thumbnail of Mishra_PNAS_2013.pdf]
Preview
PDF
Mishra_PNAS_2013.pdf - Published Version

Download (391kB) | Preview
URL: http://www.ncbi.nlm.nih.gov/pubmed/12902543
DOI: 10.1073/pnas.1633770100

Abstract

The current standard correlation coefficient used in the analysis of microarray data was introduced by M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein [(1998) Proc. Nati. Acad Sci. USA 95, 1486314868]. Its formulation is rather arbitrary. We give a mathematically rigorous correlation coefficient of two data vectors based on James-Stein shrinkage estimators. We use the assumptions described by Eisen et al., also using the fact that the data can be treated as transformed into normal distributions. While Eisen et A use zero as an estimator for the expression vector mean mu, we start with the assumption that for each gene, IL is itself a zero-mean normal random variable [with a priori distribution N(0, tau(2))], and use Bayesian analysis to obtain a posteriori distribution of mu in terms of the data. The shrunk estimator for mu differs from the mean of the data vectors and ultimately leads to a statistically robust estimator for correlation coefficients. To evaluate the effectiveness of shrinkage, we conducted in silico experiments and also compared similarity metrics on a biological example by using the data set from Eisen et A For the latter, we classified genes involved in the regulation of yeast cell-cycle functions by computing clusters based on various definitions of correlation coefficients and contrasting them against clusters based on the activators known in the literature. The estimated false positives and false negatives from this study indicate that using the shrinkage metric improves the accuracy of the analysis.

Item Type: Paper
Uncontrolled Keywords: EXPRESSION YEAST GENES
Subjects: bioinformatics > genomics and proteomics > analysis and processing
bioinformatics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification
bioinformatics > genomics and proteomics > genetics & nucleic acid processing
bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > analysis and processing > microarray gene expression processing
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > genes, structure and function > gene expression
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > genes, structure and function
organism description > yeast
CSHL Authors:
Communities: CSHL labs > Wigler lab
Depositing User: Matt Covey
Date: August 2003
Date Deposited: 02 Apr 2013 16:36
Last Modified: 10 Sep 2019 18:38
PMCID: PMC187810
Related URLs:
URI: https://repository.cshl.edu/id/eprint/27969

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving