Removing System Noise from Comparative Genomic Hybridization Data by Self-Self Analysis

Lee, Yoon-ha, Ronemus, Michael, Kendall, Jude, Lakshmi, B, Leotta, Anthony, Levy, Dan, Esposito, Diane, Grubor, Vladimir, Ye, Kenny, Wigler, Michael, Yamrom, Boris (May 2011) Removing System Noise from Comparative Genomic Hybridization Data by Self-Self Analysis. (Submitted)

[thumbnail of 1105.0900v1.pdf] PDF
1105.0900v1.pdf - Submitted Version
Available under License Creative Commons Attribution Non-commercial No Derivatives.

Download (1MB)


Genomic copy number variation (CNV) is a large source of variation between organisms, and its consequences include phenotypic differences and genetic disorders. CNVs are commonly detected by hybridizing genomic DNA to microarrays of nucleic acid probes. System noise caused by operational and probe performance variability complicates the interpretation of these data. To minimize the distortion of genetic signal by system noise, we have explored the latter in an archive of hybridizations in which no genetic signal is expected. This archive is obtained by comparative genomic hybridization (CGH) of a sample in one channel to the same sample in the other channel, or 'self-self' data. These self-self hybridizations trap a variety of system noise inherent in sample-reference (test) data. Through singular value decomposition (SVD) of self-self data, we have determined the principal components of system noise. Assuming simple linear models of noise generation, the linear correction of test data with self-self data -or 'system normalization'- reduces local and long-range correlations and improves signal-to-noise metrics, yet does not introduce detectable spurious signal. Using this method, 90% of hybridizations displayed improved signal-to-noise ratios with an average increase of 7.0%, due mainly to a reduced median average deviation (MAD). In addition, we have found that principal component loadings correlate with specific probe variables including array coordinates, base composition, and proximity to the 5' ends of genes. The correlation of the principal component loadings with the test data depends on operational variables, such as the temporal order of processing and the localization of individual samples within 96-well plates.

Item Type: Paper
Additional Information: 10 figures; 3 tables
Subjects: bioinformatics > genomics and proteomics
bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > copy number variants
CSHL Authors:
Communities: CSHL labs > Iossifov lab
CSHL labs > Levy lab
CSHL labs > Wigler lab
SWORD Depositor: CSHL Elements
Depositing User: CSHL Elements
Date: 4 May 2011
Date Deposited: 13 Oct 2023 13:59
Last Modified: 13 Oct 2023 13:59
Related URLs:

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving