Home News About Us Contact Contributors Disclaimer Privacy Policy Help FAQ

Quick Search
My eDoc
Session History
Support Wiki
Direct access to
document ID:

          Institute: MPI für molekulare Genetik     Collection: Department of Computational Molecular Biology     Display Documents

ID: 175889.0, MPI für molekulare Genetik / Department of Computational Molecular Biology
The SYSTERS protein family database: taxon-related protein family size distributions and singleton frequencies
Authors:Meinel, Thomas; Vingron, Martin; Krause, Antje
Date of Publication (YYYY-MM-DD):2003
Title of Proceedings:Proceedings of the German Conference on Bioinformatics (GCB '03)
Start Page:103
End Page:108
Name of Conference/Meeting:German Conference on Bioinformatics
Place of Conference/Meeting:Neuherberg/Garching near Munich
(Start) Date of Conference/Meeting
End Date of Conference/Meeting 
Review Status:not specified
Audience:Experts Only
Abstract / Description:Based on the SYSTERS protein family database, we present taxon-related protein family frequencies and distributions. A set of taxon-related protein families is a subset of the whole family set with respect to one taxon, where taxon is not restricted to the species level but may be any rank in the taxonomy. We examine eight ranks in the lineages of seven organisms.
A strong linear correlation is observed between the total number of different families and the number of sequences in the data set under consideration. We fitted the generalised power-law function to protein family distributions in a least-squares sense excluding singleton frequencies.
Taxon-related family distributions tend to have the same shape and a negative slope being not larger than -2.1 for large data sets. For smaller data sets, the slope is decreasing down to -3.7. Slopes of family distributions are found to be slowly increasing towards higher taxonomic ranks. Our observations lead to a new estimation of single sequence cluster frequencies. Data sets of various species are studied with respect to being complete or incomplete.
Free Keywords:protein family; large scale clustering; taxonomy; taxon-related; cluster size distribution
External Publication Status:published
Document Type:Conference-Paper
Communicated by:Martin Vingron
Affiliations:MPI für molekulare Genetik
Full Text:
You have privileges to view the following file(s):
gcb2003_meinel.pdf  [161,00 Kb]   
The scope and number of records on eDoc is subject to the collection policies defined by each institute - see "info" button in the collection browse view.