Similarity of position frequency matrices for transcription factor binding sites

Schones, D. E., Sumazin, P., Zhang, M. Q. (February 2005) Similarity of position frequency matrices for transcription factor binding sites. Bioinformatics, 21 (3). pp. 307-13. ISSN 1367-4803 (Print)

URL: http://www.scopus.com/record/display.url?eid=2-s2....
DOI: 10.1093/bioinformatics/bth480

Abstract

MOTIVATION: Transcription-factor binding sites (TFBS) in promoter sequences of higher eukaryotes are commonly modeled using position frequency matrices (PFM). The ability to compare PFMs representing binding sites is especially important for de novo sequence motif discovery, where it is desirable to compare putative matrices to one another and to known matrices. RESULTS: We describe a PFM similarity quantification method based on product multinomial distributions, demonstrate its ability to identify PFM similarity and show that it has a better false positive to false negative ratio compared to existing methods. We grouped TFBS frequency matrices from two libraries into matrix families and identified the matrices that are common and unique to these libraries. We identified similarities and differences between the skeletal-muscle-specific and non-muscle-specific frequency matrices for the binding sites of Mef-2, Myf, Sp-1, SRF and TEF of Wasserman and Fickett. We further identified known frequency matrices and matrix families that were strongly similar to the matrices given by Wasserman and Fickett. We provide methodology and tools to compare and query libraries of frequency matrices for TFBSs. AVAILABILITY: Software is available to use over the Web at http://rulai.cshl.edu/MatCompare SUPPLEMENTARY INFORMATION: Database and clustering statistics, matrix families and representatives are available at http://rulai.cshl.edu/MatCompare/Supplementary.

Item Type: Paper
Uncontrolled Keywords: Algorithms Binding Sites Protein Binding Sequence Alignment methods Sequence Analysis Protein methods Transcription Factors analysis chemistry
Subjects: bioinformatics > genomics and proteomics > databases > database search and retrieval
bioinformatics > computational biology
CSHL Authors:
Communities: CSHL labs > Zhang lab
Depositing User: CSHL Librarian
Date: 1 February 2005
Date Deposited: 05 Jan 2012 16:47
Last Modified: 05 Jan 2012 16:47
URI: https://repository.cshl.edu/id/eprint/22696

Actions (login required)

Administrator's edit/view item Administrator's edit/view item
CSHL HomeAbout CSHLResearchEducationNews & FeaturesCampus & Public EventsCareersGiving