Home News About Us Contact Contributors Disclaimer Privacy Policy Help FAQ

Quick Search
My eDoc
Session History
Support Wiki
Direct access to
document ID:

          Institute: MPI für Informatik     Collection: Computational Biology and Applied Algorithmics     Display Documents

ID: 314620.0, MPI für Informatik / Computational Biology and Applied Algorithmics
Model Selection for Mixtures of Mutagenetic Trees
Authors:Yin, Junming; Beerenwinkel, Niko; Rahnenführer, Jörg; Lengauer, Thomas
Date of Publication (YYYY-MM-DD):2006
Title of Journal:Statistical Applications in Genetics and Molecular Biology
Sequence Number of Article:17
Copyright:... For 5 years from the date of initial publication in Statistical
Applications in Genetics and Molecular Biology, the author (or copyright holder
if different) grants bepress and exclusive right to distribution of the article
in digital form...
Personal-use exceptions: The following uses are exempt from the exclusive
5-year period: ... a non-commercial open access repository, or publication site
affiliated with the author's plase of employment ...
Review Status:Peer-review
Audience:Experts Only
Intended Educational Use:No
Abstract / Description:The evolution of drug resistance in HIV is characterized by the accumulation of
resistance-associated mutations in the HIV genome. Mutagenetic trees, a family
of restricted Bayesian tree models, have been applied to infer the order and
rate of occurrence of these mutations. Understanding and predicting this
evolutionary process is an important prerequisite for the rational design of
antiretroviral therapies. In practice, mixtures models of K mutagenetic trees
provide more flexibility and are often more appropriate for modelling observed
mutational patterns.
Here, we investigate the model selection problem for K-mutagenetic trees
mixture models. We evaluate several classical model selection criteria
including cross-validation, the Bayesian Information Criterion (BIC), and the
Akaike Information Criterion. We also use the empirical Bayes method by
constructing a prior probability distribution for the parameters of a
mutagenetic trees mixture model and deriving the posterior probability of the
model. In addition to the model dimension, we consider the redundancy of a
mixture model, which is measured by comparing the topologies of trees within a
mixture model. Based on the redundancy, we propose a new model selection
criterion, which is a modification of the BIC.
Experimental results on simulated and on real HIV data show that the classical
criteria tend to select models with far too many tree components. Only
cross-validation and the modified BIC recover the correct number of trees and
the tree topologies most of the time. At the same optimal performance, the
runtime of the new BIC modification is about one order of magnitude lower.
Thus, this model selection criterion can also be used for large data sets for
which cross-validation becomes computationally infeasible.
Last Change of the Resource (YYYY-MM-DD):2007-04-02
External Publication Status:published
Document Type:Article
Communicated by:Thomas Lengauer
Affiliations:MPI für Informatik/Computational Biology and Applied Algorithmics
Full Text:
You have privileges to view the following file(s):
AG3_001.pdf  [211,00 Kb] [Comment:file from upload service]  
The scope and number of records on eDoc is subject to the collection policies defined by each institute - see "info" button in the collection browse view.