Home News About Us Contact Contributors Disclaimer Help FAQ

Home
Search
Quick Search
Advanced
Fulltext
Browse
Collections
Persons
My eDoc
Session History
Login
Name:
Password:
Documentation
Help
Support Wiki
Direct access to
document ID:


          Display Documents



ID: 518175.0, MPI für Informatik / Algorithms and Complexity Group
Better Filtering with Gapped q-Grams
Authors:Burkhardt, Stefan; Kärkkäinen, Juha
Language:English
Publisher:Springer
Place of Publication:Berlin, Germany
Date of Publication (YYYY-MM-DD):2001
Title of Proceedings:Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Start Page:73
End Page:85
Title of Series:Lecture Notes in Computer Science
Place of Conference/Meeting:Jerusalem, Israel
(Start) Date of Conference/Meeting
 (YYYY-MM-DD):
2001
Audience:Experts Only
Intended Educational Use:No
Abstract / Description:The q-gram filter is a popular filtering method for approximate
string matching. It compares substrings of length q (the q-grams)
in the pattern and the text to identify the text areas that might
contain a match. A generalization of the method is to use gapped
q-grams, subsets of q characters in some fixed non-contiguous
shape, instead of contiguous substrings. Although mentioned a few
times in the literature, this generalization has never been studied
in any depth. In ths paper, we report the first results from a
study on gapped q-grams. We show that gapped q-grams can provide
orders of magnitude faster and/or more efficient filtering than
contiguous q-grams. The performance, however, depends on the shape
of the q-grams. The best shaoes are rare and often posess no
apparen regularity. We show how to recognize good shapes and
demonstrate with experiments their advantage over both contiguous
and average shapes. We concentrate here on the k mismatches
problem, but also outline an approach for extending the results
to the more common k differences problem.
Last Change of the Resource (YYYY-MM-DD):2010-03-02
External Publication Status:published
Document Type:Conference-Paper
Communicated by:Kurt Mehlhorn
Affiliations:MPI für Informatik/Algorithms and Complexity Group
Identifiers:LOCALID:C1256428004B93B8-5165C4C93C02B85DC1256A8F00420FEE-...
URL:http://www.mpi-sb.mpg.de/~stburk/gapped-q.ps
ISBN:3-540-42271-4
The scope and number of records on eDoc is subject to the collection policies defined by each institute - see "info" button in the collection browse view.