Home News About Us Contact Contributors Disclaimer Privacy Policy Help FAQ

Quick Search
My eDoc
Session History
Support Wiki
Direct access to
document ID:

          Display Documents

ID: 404466.0, MPI für molekulare Genetik / Department of Computational Molecular Biology
Natural similarity measures between position frequency matrices with an application to clustering
Authors:Pape, Utz J.; Rahmann, Sven; Vingron, Martin
Date of Publication (YYYY-MM-DD):2008-01-02
Title of Journal:Bioinformatics
Issue / Number:3
Start Page:350
End Page:357
Full name of Issue-Editor(s):Associate Editor: Alfonso Valenica
Copyright:© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Review Status:not specified
Audience:Experts Only
Abstract / Description:Motivation: Transcription factors (TFs) play a key role in gene regulation by binding to target sequences. In silico prediction of potential binding of a TF to a binding site is a well-studied problem in computational biology. The binding sites for one TF are represented by a position frequency matrix (PFM). The discovery of new PFMs requires the comparison to known PFMs to avoid redundancies. In general, two PFMs are similar if they occur at overlapping positions under a null model. Still, most existing methods compute similarity according to probabilistic distances of the PFMs. Here we propose a natural similarity measure based on the asymptotic covariance between the number of PFM hits incorporating both strands. Furthermore, we introduce a second measure based on the same idea to cluster a set of the Jaspar PFMs.

Results: We show that the asymptotic covariance can be efficiently computed by a two dimensional convolution of the score distributions. The asymptotic covariance approach shows strong correlation with simulated data. It outperforms three alternative methods. The Jaspar clustering yields distinct groups of TFs of the same class. Furthermore, a representative PFM is given for each class. In contrast to most other clustering methods, PFMs with low similarity automatically remain singletons.

Availability: A website to compute the similarity and to perform clustering, the source code and Supplementary Material are available at http://mosta.molgen.mpg.de
Comment of the Author/Creator:Availability: A website to compute the similarity and to perform clustering, the source code and Supplementary Material are available at http://mosta.molgen.mpg.de

Contact: utz.pape@molgen.mpg.de

Supplementary information: Supplementary data are available at Bioinformatics online.
External Publication Status:published
Document Type:Article
Communicated by:Martin Vingron
Affiliations:MPI für molekulare Genetik
External Affiliations:1.Mathematics and Computer Science, Free University of Berlin, Takustr. 9, 14195 Berlin, Germany;
2.COMET group, Genome Informatics, Universität Bielefeld, 33594 Bielefeld, Germany;
3.Bioinformatics for High-Throughput Technologies, Computer Science 11, Dortmund University, 44221 Dortmund, Germany.
Full Text:
You have privileges to view the following file(s):
350.pdf  [215,00 Kb] [Comment:open access]  
The scope and number of records on eDoc is subject to the collection policies defined by each institute - see "info" button in the collection browse view.