Home News About Us Contact Contributors Disclaimer Privacy Policy Help FAQ

Home
Search
Quick Search
Advanced
Fulltext
Browse
Collections
Persons
My eDoc
Session History
Login
Name:
Password:
Documentation
Help
Support Wiki
Direct access to
document ID:


          Institute: MPI für biologische Kybernetik     Collection: Biologische Kybernetik     Display Documents



ID: 350362.0, MPI für biologische Kybernetik / Biologische Kybernetik
Gene selection via the BAHSIC family of algorithms
Authors:Song, L.; Bedo, J.; Borgwardt, K.M.; Gretton, A.; Smola, A.
Date of Publication (YYYY-MM-DD):2007-07
Title of Journal:Bioinformatics
Volume:23
Issue / Number:13
Start Page:i490
End Page:i498
Audience:Not Specified
Intended Educational Use:No
Abstract / Description:Motivation: Identifying significant genes among thousands of sequences on a microarray is a central challenge for cancer research in bioinformatics. The ultimate goal is to detect the genes that are involved in disease outbreak and progression. A multitude of methods have been proposed for this task of feature selection, yet the selected gene lists differ greatly between different methods. To accomplish biologically meaningful gene selection from microarray data, we have to understand the theoretical connections and the differences between these methods. In this article, we define a kernel-based framework for feature selection based on the Hilbert–Schmidt independence criterion and backward elimination, called BAHSIC. We show that several well-known feature selectors are instances of BAHSIC, thereby clarifying their relationship. Furthermore, by choosing a different kernel, BAHSIC allows us to easily define novel feature selection algorithms. As a further advantage, feature selection via BAHSIC works directly on multiclass problems.

Results: In a broad experimental evaluation, the members of the BAHSIC family reach high levels of accuracy and robustness when compared to other feature selection techniques. Experiments show that features selected with a linear kernel provide the best classification performance in general, but if strong non-linearities are present in the data then non-linear kernels can be more suitable.
External Publication Status:published
Document Type:Article
Communicated by:Holger Fischer
Affiliations:MPI für biologische Kybernetik/Empirical Inference (Dept. Schölkopf)
Identifiers:LOCALID:4764
URL:http://bioinformatics.oxfordjournals.org/cgi/conte...
The scope and number of records on eDoc is subject to the collection policies defined by each institute - see "info" button in the collection browse view.