Detecting changes in XML documents

We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volumes of XML data. Because of the context, our algorithm has to be very efficient in terms of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:18th International Conference on Data Engineering: Proceedings 2002: San Jose, California S. 41 - 52
Hauptverfasser: Cobena, G., Abiteboul, S., Marian, A.
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: Los Alamitos CA IEEE 2002
Schlagworte:
ISBN:9780769515311, 0769515312
ISSN:1063-6382
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volumes of XML data. Because of the context, our algorithm has to be very efficient in terms of speed and memory space even at the cost of some loss of quality. Also, it considers, besides insertions, deletions and updates (standard in diffs), a move operation on subtrees that is essential in the context of XML. Intuitively, our diff algorithm uses signatures to match (large) subtrees that were left unchanged between the old and new versions. Such exact matchings are then possibly propagated to ancestors and descendants to obtain more matchings. It also uses XML specific information such as ID attributes. We provide a performance analysis of the algorithm. We show that it runs in average in linear time vs. quadratic time for previous algorithms. We present experiments on synthetic data that confirm the analysis. Since this problem is NP-hard, the linear time is obtained by trading some quality. We present experiments (again on synthetic data) that show that the output of our algorithm is reasonably close to the optimal in terms of quality. Finally we present experiments on a small sample of XML pages found on the Web.
AbstractList We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volumes of XML data. Because of the context, our algorithm has to be very efficient in terms of speed and memory space even at the cost of some loss of quality. Also, it considers, besides insertions, deletions and updates (standard in diffs), a move operation on subtrees that is essential in the context of XML. Intuitively, our diff algorithm uses signatures to match (large) subtrees that were left unchanged between the old and new versions. Such exact matchings are then possibly propagated to ancestors and descendants to obtain more matchings. It also uses XML specific information such as ID attributes. We provide a performance analysis of the algorithm. We show that it runs in average in linear time vs. quadratic time for previous algorithms. We present experiments on synthetic data that confirm the analysis. Since this problem is NP-hard, the linear time is obtained by trading some quality. We present experiments (again on synthetic data) that show that the output of our algorithm is reasonably close to the optimal in terms of quality. Finally we present experiments on a small sample of XML pages found on the Web.
Author Cobena, G.
Abiteboul, S.
Marian, A.
Author_xml – sequence: 1
  givenname: G.
  surname: Cobena
  fullname: Cobena, G.
  organization: INRIA, Rocquencourt, France
– sequence: 2
  givenname: S.
  surname: Abiteboul
  fullname: Abiteboul, S.
– sequence: 3
  givenname: A.
  surname: Marian
  fullname: Marian, A.
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=15812220$$DView record in Pascal Francis
BookMark eNo9j01Lw0AURQesYK3Zi6sguEycN5P5eEtJqxYibhTclcnkTY2005KJC_-9hYh3cxfncOFeslk8RGLsGngJwPF-XS9XpeBclIiVRn3GMjSWG40KlASYsTlwLQstrbhgWUpf_BSsABSfs9sljeTHPm5z_-nillLex_zjpcm7g__eUxzTFTsPbpco--sFe39cvdXPRfP6tK4fmqIXXIyFDlRZbLUwBgNgi63vyDonlRdoW_S-U-iMEh6CqwzIQCdiyVRVsG0QcsHupt2jS97twuCi79PmOPR7N_xsQFkQQvCTdzN5PRH94-m8_AUgm00o
ContentType Conference Proceeding
Copyright 2004 INIST-CNRS
Copyright_xml – notice: 2004 INIST-CNRS
DBID 6IE
6IH
CBEJK
RIE
RIO
IQODW
DOI 10.1109/ICDE.2002.994696
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP) 1998-present
Pascal-Francis
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
Applied Sciences
EndPage 52
ExternalDocumentID 15812220
994696
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
6IK
AAJGR
AAVQY
IQODW
RIB
RIC
ID FETCH-LOGICAL-i202t-6fe489b62779f19b9bcde8aa35c298b9ccd59a752c1fa4713fe5c28e744f8bf23
IEDL.DBID RIE
ISBN 9780769515311
0769515312
ISICitedReferencesCount 138
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000175295900005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 1063-6382
IngestDate Mon May 05 02:00:28 EDT 2025
Tue Aug 26 17:27:13 EDT 2025
IsPeerReviewed false
IsScholarly true
Keywords Information use
HTML language
World wide web
Data storage
XML language
Performance analysis
Algorithm analysis
Pattern matching
Language English
License CC BY 4.0
LinkModel DirectLink
MeetingName Data engineering (San Jose CA, 26 February - 1 March 2002)
MergedId FETCHMERGED-LOGICAL-i202t-6fe489b62779f19b9bcde8aa35c298b9ccd59a752c1fa4713fe5c28e744f8bf23
PageCount 12
ParticipantIDs ieee_primary_994696
pascalfrancis_primary_15812220
PublicationCentury 2000
PublicationDate 20020000
2002
PublicationDateYYYYMMDD 2002-01-01
PublicationDate_xml – year: 2002
  text: 20020000
PublicationDecade 2000
PublicationPlace Los Alamitos CA
PublicationPlace_xml – name: Los Alamitos CA
PublicationTitle 18th International Conference on Data Engineering: Proceedings 2002: San Jose, California
PublicationTitleAbbrev ICDE
PublicationYear 2002
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000941150
ssj0000455431
Score 2.195936
Snippet We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating...
SourceID pascalfrancis
ieee
SourceType Index Database
Publisher
StartPage 41
SubjectTerms Applied sciences
Change detection algorithms
Computer science; control theory; systems
Costs
Crawlers
Data warehouses
Database languages
Exact sciences and technology
HTML
Information systems. Data bases
Memory organisation. Data processing
Performance analysis
Software
Subscriptions
Web and internet services
XML
Title Detecting changes in XML documents
URI https://ieeexplore.ieee.org/document/994696
WOSCitedRecordID wos000175295900005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JT8JAFJ4I8eAJRYy4kMZ4LbSzz5klmiDhoAk3MmuCh2Io-PudzrSoiRdvbaZL3us0b_8-AB6ZQphp46MTTnSKTWZSKSlPOVGOGouYjiCuc7ZY8NVKLGuc7TALY60NzWd2WB2GWr7Z6kOVKhsJ4YM52gItxmgc1TqmU7xnQhqYmPfYMFf5OqHWSVHqd1kdtHuPwm87WGPvNOd5U8HMxOh5PJmG3oVhfF_Nu1J1TcrSK85FxosfZmjW-ZcA56D3Pc6XLI-G6gKc2KILOg2fQ1L_3pfgYWKrkoK_JonzwGWyKZLVyzxpHl32wNts-jp-SmsOhXQDM7hPqbOYC0UhY8LlQgmljeVSIqKh4EpobYiQjECdO-kNFXLWr3DLMHZcOYiuQLvYFvYaJAg7TJzDDDmKpVIyQ1Jr4rVXYc4Z0wfdSuL1R4TJWEdh-2DwS23H5Zx43wLC7ObP227BWeBdCcmOO9De7w72Hpzqz_2m3A3Ct_8CF6SnMg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEN4omugJRYz4wMZ4LbT76O6eEQKxEA6YcCP7TPBQDAV_v7t9oCZevLXZbZuZTDMzOzPfB8AzlQhTpV12wogKsY50KETCQkakTbRBVJUgrimdzdhyyecVznYxC2OMKZrPTM9fFrV8vVF7f1TW59wlc8kxOPHEWdWw1uFAxcUmpAaKeS9b5ny0U1Q7ExQ6O6vSdhdTOMODFfpOfR_XNcyI9yeDl2HRvdArv1gxr_i-SZE71dmS8-KHIxo1_yXCBWh_D_QF84OrugRHJmuBZs3oEFQ_-BV4ejG-qOD2BOVEcB6ss2A5TYP61XkbvI2Gi8E4rFgUwjWM4C5MrMGMywRSym3MJZdKGyYEIgpyJrlSmnBBCVSxFc5VIWvcCjMUY8ukhegaNLJNZm5AgLDFxFpMkU2wkFJESChFnPY86pzWHdDyEq8-SqCMVSlsB3R_qe2wHBMXXUAY3f752CM4Gy-m6SqdzF7vwHnBwlIcfdyDxm67Nw_gVH3u1vm2W9jBFwXqqns
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=18th+International+Conference+on+Data+Engineering%3A+Proceedings+2002%3A+San+Jose%2C+California&rft.atitle=Detecting+changes+in+XML+documents&rft.au=COBENA%2C+Gr%C3%A9gory&rft.au=ABITEBOUL%2C+Serge&rft.au=MARIAN%2C+Am%C3%A9lie&rft.date=2002-01-01&rft.pub=IEEE&rft.isbn=9780769515311&rft.spage=41&rft.epage=52&rft_id=info:doi/10.1109%2FICDE.2002.994696&rft.externalDBID=n%2Fa&rft.externalDocID=15812220
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6382&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6382&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6382&client=summon