MPI für molekulare Genetik / Department of Computational Molecular Biology |
|The SYSTERS protein family database: taxon-related protein family size distributions and singleton frequencies|
|Authors:||Meinel, Thomas; Vingron, Martin; Krause, Antje|
|Date of Publication (YYYY-MM-DD):||2003|
|Title of Proceedings:||Proceedings of the German Conference on Bioinformatics (GCB '03)|
|Name of Conference/Meeting:||German Conference on Bioinformatics|
|Place of Conference/Meeting:||Neuherberg/Garching near Munich|
|(Start) Date of Conference/Meeting|
|End Date of Conference/Meeting |
|Review Status:||not specified|
|Abstract / Description:||Based on the SYSTERS protein family database, we present taxon-related protein family frequencies and distributions. A set of taxon-related protein families is a subset of the whole family set with respect to one taxon, where taxon is not restricted to the species level but may be any rank in the taxonomy. We examine eight ranks in the lineages of seven organisms.
A strong linear correlation is observed between the total number of different families and the number of sequences in the data set under consideration. We fitted the generalised power-law function to protein family distributions in a least-squares sense excluding singleton frequencies.
Taxon-related family distributions tend to have the same shape and a negative slope being not larger than -2.1 for large data sets. For smaller data sets, the slope is decreasing down to -3.7. Slopes of family distributions are found to be slowly increasing towards higher taxonomic ranks. Our observations lead to a new estimation of single sequence cluster frequencies. Data sets of various species are studied with respect to being complete or incomplete.
|Free Keywords:||protein family; large scale clustering; taxonomy; taxon-related; cluster size distribution|
|External Publication Status:||published|
|Communicated by:||Martin Vingron|
|Affiliations:||MPI für molekulare Genetik|
|You have privileges to view the following file(s):|
|gcb2003_meinel.pdf [161,00 Kb] |