Lee, Yoon-ha, Ronemus, Michael, Kendall, Jude, Lakshmi, B, Leotta, Anthony, Levy, Dan, Esposito, Diane, Grubor, Vladimir, Ye, Kenny, Wigler, Michael, Yamrom, Boris (May 2011) Removing System Noise from Comparative Genomic Hybridization Data by Self-Self Analysis. (Submitted)
Preview |
PDF
1105.0900v1.pdf - Submitted Version Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (1MB) | Preview |
Abstract
Genomic copy number variation (CNV) is a large source of variation between organisms, and its consequences include phenotypic differences and genetic disorders. CNVs are commonly detected by hybridizing genomic DNA to microarrays of nucleic acid probes. System noise caused by operational and probe performance variability complicates the interpretation of these data. To minimize the distortion of genetic signal by system noise, we have explored the latter in an archive of hybridizations in which no genetic signal is expected. This archive is obtained by comparative genomic hybridization (CGH) of a sample in one channel to the same sample in the other channel, or 'self-self' data. These self-self hybridizations trap a variety of system noise inherent in sample-reference (test) data. Through singular value decomposition (SVD) of self-self data, we have determined the principal components of system noise. Assuming simple linear models of noise generation, the linear correction of test data with self-self data -or 'system normalization'- reduces local and long-range correlations and improves signal-to-noise metrics, yet does not introduce detectable spurious signal. Using this method, 90% of hybridizations displayed improved signal-to-noise ratios with an average increase of 7.0%, due mainly to a reduced median average deviation (MAD). In addition, we have found that principal component loadings correlate with specific probe variables including array coordinates, base composition, and proximity to the 5' ends of genes. The correlation of the principal component loadings with the test data depends on operational variables, such as the temporal order of processing and the localization of individual samples within 96-well plates.
Item Type: | Paper |
---|---|
Additional Information: | 10 figures; 3 tables |
Subjects: | bioinformatics > genomics and proteomics bioinformatics > genomics and proteomics > genetics & nucleic acid processing > DNA, RNA structure, function, modification > copy number variants |
CSHL Authors: | |
Communities: | CSHL labs > Iossifov lab CSHL labs > Levy lab CSHL labs > Wigler lab |
SWORD Depositor: | CSHL Elements |
Depositing User: | CSHL Elements |
Date: | 4 May 2011 |
Date Deposited: | 13 Oct 2023 13:59 |
Last Modified: | 13 Oct 2023 13:59 |
Related URLs: | |
URI: | https://repository.cshl.edu/id/eprint/41223 |
Actions (login required)
Administrator's edit/view item |