Please note that eDoc will be permanently shut down in the first quarter of 2021!      Home News About Us Contact Contributors Disclaimer Privacy Policy Help FAQ

Quick Search
My eDoc
Session History
Support Wiki
Direct access to
document ID:

          Institute: MPI für biologische Kybernetik     Collection: Biologische Kybernetik     Display Documents

ID: 461830.0, MPI für biologische Kybernetik / Biologische Kybernetik
Near-optimal supervised feature selection among frequent subgraphs
Authors:Thoma, M.; Cheng, H.; Gretton, A.; Han, J.; Kriegel, H.-P.; Smola, A.J.; Song, L.; Yu, P.S.; Yan, X.; Borgwardt, K.M.
Editors:Park, H.; Parthasarathy, S.; Liu, H.
Date of Publication (YYYY-MM-DD):2009-05
Title of Proceedings:Proccedings of the 2009 SIAM Conference on Data Mining (SDM 2009)
Start Page:1076
End Page:1087
Physical Description:12
Audience:Not Specified
Intended Educational Use:No
Abstract / Description:Graph classification is an increasingly important step in
numerous application domains, such as function prediction
of molecules and proteins, computerised scene analysis, and
anomaly detection in program flows.
Among the various approaches proposed in the literature,
graph classification based on frequent subgraphs is
a popular branch: Graphs are represented as (usually binary)
vectors, with components indicating whether a graph
contains a particular subgraph that is frequent across the
On large graphs, however, one faces the enormous
problem that the number of these frequent subgraphs may
grow exponentially with the size of the graphs, but only few
of them possess enough discriminative power to make them
useful for graph classification. Efficient and discriminative
feature selection among frequent subgraphs is hence a key
challenge for graph mining.
In this article, we propose an approach to feature selection
on frequent subgraphs, called CORK, that combines two
central advantages. First, it optimizes a submodular quality
criterion, which means that we can yield a near-optimal
solution using greedy feature selection. Second, our submodular
quality function criterion can be integrated into gSpan,
the state-of-the-art tool for frequent subgraph mining, and
help to prune the search space for discriminative frequent
subgraphs even during frequent subgraph mining.
External Publication Status:published
Document Type:Conference-Paper
Communicated by:Holger Fischer
Affiliations:MPI für biologische Kybernetik/Empirical Inference (Dept. Schölkopf)
The scope and number of records on eDoc is subject to the collection policies defined by each institute - see "info" button in the collection browse view.