Multi-feature and DAG-Based Multi-tree Matching Algorithm for Automatic Web Data Mining
Web data extraction has received considerable attention and study in recent decades. To improve efficiency, many automatic Web data record mining approaches have been proposed. Among these approaches, each complete approach involves data record identification as well as data item alignment. In this...
Uloženo v:
| Vydáno v: | 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT) Ročník 1; s. 118 - 125 |
|---|---|
| Hlavní autoři: | , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
01.08.2014
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Web data extraction has received considerable attention and study in recent decades. To improve efficiency, many automatic Web data record mining approaches have been proposed. Among these approaches, each complete approach involves data record identification as well as data item alignment. In this paper, we propose a new multi-feature and DAG (Directed Acyclic Graph) based multi-tree matching algorithm for automatic data item alignment. Our algorithm improves alignment accuracy in two aspects. First, it combines multiple features to cope with the limitations of existing algorithms, second, it employs a DAG-based method to deduce the global alignment of data items with high accuracy. Experimental results show that our algorithm outperforms state-of-the-art data item alignment algorithms. |
|---|---|
| DOI: | 10.1109/WI-IAT.2014.24 |