Recognize, Annotate, and Visualize Parallel Content Structures in XML Documents

We present a four-phase parallel approach for capturing, annotating, and visualizing parallel structures in XML documents. We designed a highlighting strategy that first decomposes XML documents in various data streams, including plain text, formulae, and images. Second, those streams are processed...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL) s. 258 - 261
Hlavní autoři: Beck, Marco, Schubotz, Moritz, Stange, Vincent, Meuschke, Norman, Gipp, Bela
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 01.09.2021
Témata:
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:We present a four-phase parallel approach for capturing, annotating, and visualizing parallel structures in XML documents. We designed a highlighting strategy that first decomposes XML documents in various data streams, including plain text, formulae, and images. Second, those streams are processed with external algorithms and tools optimized for specific tasks, such as analyzing similarities or differences or differences in the respective formats. Third, we compute comparison metadata such as annotations and highlighting marks. Fourth, the position information is concatenated based on the original XML's computed positions document. Eventually, the resulting comparison can then be visualized or processed further while keeping the reference to the source documents intact. While our algorithm has been developed for visualizing similarities as part of plagiarism detection tasks, we expect that many applications will benefit from a well-designed and integrative method that separates between addressing the match locations and inserting highlight marks. For example, our algorithm can also add comments in XML-unaware plaintext editors. We also treat the edge cases, overlaps as well as multi-match with our approach.
DOI:10.1109/JCDL52503.2021.00078