Home News About Us Contact Contributors Disclaimer Privacy Policy Help FAQ

Home
Search
Quick Search
Advanced
Fulltext
Browse
Collections
Persons
My eDoc
Session History
Login
Name:
Password:
Documentation
Help
Support Wiki
Direct access to
document ID:


          Display Documents



  history
ID: 265192.0, MPI für molekulare Genetik / Department of Computational Molecular Biology
Large Scale Hierarchical Clustering of Protein Sequences
Authors:Krause, Antje; Stoye, Jens; Vingron, Martin
Language:English
Date of Publication (YYYY-MM-DD):2005-01-22
Title of Journal:BMC Bioinformatics
Volume:6
Start Page:15
End Page:15
Copyright:© 1999-2006 BioMed Central Ltd unless otherwise stated
Review Status:not specified
Audience:Experts Only
Abstract / Description:Background
Searching a biological sequence database with a query sequence looking for homologues has become a routine operation in computational biology. In spite of the high degree of sophistication of currently available search routines it is still virtually impossible to identify quickly and clearly a group of sequences that a given query sequence belongs to.
Results
We report on our developments in grouping all known protein sequences hierarchically into superfamily and family clusters. Our graph-based algorithms take into account the topology of the sequence space induced by the data itself to construct a biologically meaningful partitioning. We have applied our clustering procedures to a non-redundant set of about 1,000,000 sequences resulting in a hierarchical clustering which is being made available for querying and browsing at http://systers.molgen.mpg.de/.
Conclusions
Comparisons with other widely used clustering methods on various data sets show the abilities and strengths of our clustering methods in producing a biologically meaningful grouping of protein sequences.
Comment of the Author/Creator:Methodology article
External Publication Status:published
Document Type:Article
Communicated by:Martin Vingron
Affiliations:MPI für molekulare Genetik
External Affiliations:Universität Bielefeld, Technische Fakultät, AG Genominformatik, Postfach 100131, 33501 Bielefeld, Germany;
TFH Wildau, Bahnhofstrasse 1, 15745 Wildau, Germany.
Identifiers:DOI:10.1186/1471-2105-6-15
ISSN:1471-2105
Full Text:
You have privileges to view the following file(s):
SYSTERS Large-scale Protein Clustering and Protein Family Database.htm  [10,00 Kb]   
 
The scope and number of records on eDoc is subject to the collection policies defined by each institute - see "info" button in the collection browse view.