Detecting changes in XML documents
We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volumes of XML data. Because of the context, our algorithm has to be very efficient in terms of...
Gespeichert in:
| Veröffentlicht in: | 18th International Conference on Data Engineering: Proceedings 2002: San Jose, California S. 41 - 52 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
Los Alamitos CA
IEEE
2002
|
| Schlagworte: | |
| ISBN: | 9780769515311, 0769515312 |
| ISSN: | 1063-6382 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volumes of XML data. Because of the context, our algorithm has to be very efficient in terms of speed and memory space even at the cost of some loss of quality. Also, it considers, besides insertions, deletions and updates (standard in diffs), a move operation on subtrees that is essential in the context of XML. Intuitively, our diff algorithm uses signatures to match (large) subtrees that were left unchanged between the old and new versions. Such exact matchings are then possibly propagated to ancestors and descendants to obtain more matchings. It also uses XML specific information such as ID attributes. We provide a performance analysis of the algorithm. We show that it runs in average in linear time vs. quadratic time for previous algorithms. We present experiments on synthetic data that confirm the analysis. Since this problem is NP-hard, the linear time is obtained by trading some quality. We present experiments (again on synthetic data) that show that the output of our algorithm is reasonably close to the optimal in terms of quality. Finally we present experiments on a small sample of XML pages found on the Web. |
|---|---|
| AbstractList | We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating dynamic warehouses capable of storing massive volumes of XML data. Because of the context, our algorithm has to be very efficient in terms of speed and memory space even at the cost of some loss of quality. Also, it considers, besides insertions, deletions and updates (standard in diffs), a move operation on subtrees that is essential in the context of XML. Intuitively, our diff algorithm uses signatures to match (large) subtrees that were left unchanged between the old and new versions. Such exact matchings are then possibly propagated to ancestors and descendants to obtain more matchings. It also uses XML specific information such as ID attributes. We provide a performance analysis of the algorithm. We show that it runs in average in linear time vs. quadratic time for previous algorithms. We present experiments on synthetic data that confirm the analysis. Since this problem is NP-hard, the linear time is obtained by trading some quality. We present experiments (again on synthetic data) that show that the output of our algorithm is reasonably close to the optimal in terms of quality. Finally we present experiments on a small sample of XML pages found on the Web. |
| Author | Cobena, G. Abiteboul, S. Marian, A. |
| Author_xml | – sequence: 1 givenname: G. surname: Cobena fullname: Cobena, G. organization: INRIA, Rocquencourt, France – sequence: 2 givenname: S. surname: Abiteboul fullname: Abiteboul, S. – sequence: 3 givenname: A. surname: Marian fullname: Marian, A. |
| BackLink | http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=15812220$$DView record in Pascal Francis |
| BookMark | eNo9j01Lw0AURQesYK3Zi6sguEycN5P5eEtJqxYibhTclcnkTY2005KJC_-9hYh3cxfncOFeslk8RGLsGngJwPF-XS9XpeBclIiVRn3GMjSWG40KlASYsTlwLQstrbhgWUpf_BSsABSfs9sljeTHPm5z_-nillLex_zjpcm7g__eUxzTFTsPbpco--sFe39cvdXPRfP6tK4fmqIXXIyFDlRZbLUwBgNgi63vyDonlRdoW_S-U-iMEh6CqwzIQCdiyVRVsG0QcsHupt2jS97twuCi79PmOPR7N_xsQFkQQvCTdzN5PRH94-m8_AUgm00o |
| ContentType | Conference Proceeding |
| Copyright | 2004 INIST-CNRS |
| Copyright_xml | – notice: 2004 INIST-CNRS |
| DBID | 6IE 6IH CBEJK RIE RIO IQODW |
| DOI | 10.1109/ICDE.2002.994696 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP) 1998-present Pascal-Francis |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science Applied Sciences |
| EndPage | 52 |
| ExternalDocumentID | 15812220 994696 |
| GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO 6IK AAJGR AAVQY IQODW RIB RIC |
| ID | FETCH-LOGICAL-i202t-6fe489b62779f19b9bcde8aa35c298b9ccd59a752c1fa4713fe5c28e744f8bf23 |
| IEDL.DBID | RIE |
| ISBN | 9780769515311 0769515312 |
| ISICitedReferencesCount | 138 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000175295900005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 1063-6382 |
| IngestDate | Mon May 05 02:00:28 EDT 2025 Tue Aug 26 17:27:13 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Keywords | Information use HTML language World wide web Data storage XML language Performance analysis Algorithm analysis Pattern matching |
| Language | English |
| License | CC BY 4.0 |
| LinkModel | DirectLink |
| MeetingName | Data engineering (San Jose CA, 26 February - 1 March 2002) |
| MergedId | FETCHMERGED-LOGICAL-i202t-6fe489b62779f19b9bcde8aa35c298b9ccd59a752c1fa4713fe5c28e744f8bf23 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_994696 pascalfrancis_primary_15812220 |
| PublicationCentury | 2000 |
| PublicationDate | 20020000 2002 |
| PublicationDateYYYYMMDD | 2002-01-01 |
| PublicationDate_xml | – year: 2002 text: 20020000 |
| PublicationDecade | 2000 |
| PublicationPlace | Los Alamitos CA |
| PublicationPlace_xml | – name: Los Alamitos CA |
| PublicationTitle | 18th International Conference on Data Engineering: Proceedings 2002: San Jose, California |
| PublicationTitleAbbrev | ICDE |
| PublicationYear | 2002 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0000941150 ssj0000455431 |
| Score | 2.195936 |
| Snippet | We present a diff algorithm for XML data. This work is motivated by the support for change control in the context of the Xyleme project that is investigating... |
| SourceID | pascalfrancis ieee |
| SourceType | Index Database Publisher |
| StartPage | 41 |
| SubjectTerms | Applied sciences Change detection algorithms Computer science; control theory; systems Costs Crawlers Data warehouses Database languages Exact sciences and technology HTML Information systems. Data bases Memory organisation. Data processing Performance analysis Software Subscriptions Web and internet services XML |
| Title | Detecting changes in XML documents |
| URI | https://ieeexplore.ieee.org/document/994696 |
| WOSCitedRecordID | wos000175295900005&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1JT8JAFJ4I8eAJRYy4kMZ4LbSzz5klmiDhoAk3MmuCh2Io-PudzrSoiRdvbaZL3us0b_8-AB6ZQphp46MTTnSKTWZSKSlPOVGOGouYjiCuc7ZY8NVKLGuc7TALY60NzWd2WB2GWr7Z6kOVKhsJ4YM52gItxmgc1TqmU7xnQhqYmPfYMFf5OqHWSVHqd1kdtHuPwm87WGPvNOd5U8HMxOh5PJmG3oVhfF_Nu1J1TcrSK85FxosfZmjW-ZcA56D3Pc6XLI-G6gKc2KILOg2fQ1L_3pfgYWKrkoK_JonzwGWyKZLVyzxpHl32wNts-jp-SmsOhXQDM7hPqbOYC0UhY8LlQgmljeVSIqKh4EpobYiQjECdO-kNFXLWr3DLMHZcOYiuQLvYFvYaJAg7TJzDDDmKpVIyQ1Jr4rVXYc4Z0wfdSuL1R4TJWEdh-2DwS23H5Zx43wLC7ObP227BWeBdCcmOO9De7w72Hpzqz_2m3A3Ct_8CF6SnMg |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LT8JAEN4omugJRYz4wMZ4LbT76O6eEQKxEA6YcCP7TPBQDAV_v7t9oCZevLXZbZuZTDMzOzPfB8AzlQhTpV12wogKsY50KETCQkakTbRBVJUgrimdzdhyyecVznYxC2OMKZrPTM9fFrV8vVF7f1TW59wlc8kxOPHEWdWw1uFAxcUmpAaKeS9b5ny0U1Q7ExQ6O6vSdhdTOMODFfpOfR_XNcyI9yeDl2HRvdArv1gxr_i-SZE71dmS8-KHIxo1_yXCBWh_D_QF84OrugRHJmuBZs3oEFQ_-BV4ejG-qOD2BOVEcB6ss2A5TYP61XkbvI2Gi8E4rFgUwjWM4C5MrMGMywRSym3MJZdKGyYEIgpyJrlSmnBBCVSxFc5VIWvcCjMUY8ukhegaNLJNZm5AgLDFxFpMkU2wkFJESChFnPY86pzWHdDyEq8-SqCMVSlsB3R_qe2wHBMXXUAY3f752CM4Gy-m6SqdzF7vwHnBwlIcfdyDxm67Nw_gVH3u1vm2W9jBFwXqqns |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=18th+International+Conference+on+Data+Engineering%3A+Proceedings+2002%3A+San+Jose%2C+California&rft.atitle=Detecting+changes+in+XML+documents&rft.au=COBENA%2C+Gr%C3%A9gory&rft.au=ABITEBOUL%2C+Serge&rft.au=MARIAN%2C+Am%C3%A9lie&rft.date=2002-01-01&rft.pub=IEEE&rft.isbn=9780769515311&rft.spage=41&rft.epage=52&rft_id=info:doi/10.1109%2FICDE.2002.994696&rft.externalDBID=n%2Fa&rft.externalDocID=15812220 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6382&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6382&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6382&client=summon |

