Home News About Us Contact Contributors Disclaimer Privacy Policy Help FAQ

Home
Search
Quick Search
Advanced
Fulltext
Browse
Collections
Persons
My eDoc
Session History
Login
Name:
Password:
Documentation
Help
Support Wiki
Direct access to
document ID:


          Institute: MPI für Informatik     Collection: Databases and Information Systems Group     Display Documents



ID: 520413.0, MPI für Informatik / Databases and Information Systems Group
EverLast: A Distributed Architecture for Preserving the {W}eb
Authors:Anand, Avishek; Bedathur, Srikanta; Berberich, Klaus; Schenkel, Ralf; Tryfonopoulos, Christos
Language:English
Publisher:ACM
Place of Publication:New York, USA
Date of Publication (YYYY-MM-DD):2009
Title of Proceedings:Proceedings of the Joint Conference on Digital Libraries (JCDL 2009)
Start Page:331
End Page:340
Place of Conference/Meeting:Austin, Texas
(Start) Date of Conference/Meeting
 (YYYY-MM-DD):
2009-06-15
End Date of Conference/Meeting 
 (YYYY-MM-DD):
2009-03-19
Audience:Experts Only
Intended Educational Use:No
Abstract / Description:The World Wide Web has become a key source of knowledge
pertaining to almost every walk of life. Unfortunately,
much of data on the Web is highly ephemeral in nature,
with more than 50-80% of content estimated to be changing
within a short time. Continuing the pioneering efforts of
many national (digital) libraries, organizations such as the
International Internet Preservation Consortium (IIPC), the
Internet Archive (IA) and the European Archive (EA) have
been tirelessly working towards preserving the ever changing
Web.
However, while these web archiving efforts have paid significant
attention towards long term preservation of Web
data, they have paid little attention to developing an globalscale
infrastructure for collecting, archiving, and performing
historical analyzes on the collected data. Based on insights
from our recent work on building text analytics for Web
Archives, we propose EverLast , a scalable distributed framework
for next generation Web archival and temporal text
analytics over the archive. Our system is built on a looselycoupled
distributed architecture that can be deployed over
large-scale peer-to-peer networks. In this way, we allow the
integration of many archival efforts taken mainly at a national
level by national digital libraries. Key features of
EverLast include support of time-based text search & analysis
and the use of human-assisted archive gathering. In this
paper, we outline the overall architecture of EverLast, and
present some promising preliminary results.
Last Change of the Resource (YYYY-MM-DD):2009-07-09
External Publication Status:published
Document Type:Conference-Paper
Communicated by:Gerhard Weikum
Affiliations:MPI für Informatik/Databases and Information Systems Group
Identifiers:LOCALID:C1256DBF005F876D-8109AA88947D8367C1257574003AD4DC-...
The scope and number of records on eDoc is subject to the collection policies defined by each institute - see "info" button in the collection browse view.