An ontology-driven multimedia focused crawler based on linked open data and deep learning techniques

Web-page indexing and classification have been studied extensively starting from the early WWW years. A smart intelligent web agent called focused crawler is a specific software able to seek web pages that are relevant to a particular topic domain. In this article we propose a novel approach to focu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications Jg. 79; H. 11-12; S. 7577 - 7598
Hauptverfasser:	Capuano, Andrea, Rinaldi, Antonio M., Russo, Cristiano
Format:	Journal Article
Sprache:	Englisch
Veröffentlicht:	New York Springer US 01.03.2020 Springer Nature B.V
Schlagworte:	Artificial neural networks Computer Communication Networks Computer Science Data Structures and Information Theory Deep learning Domains Feature extraction Ground truth Image classification Linked Data Multimedia Multimedia Information Systems Neural networks Ontology Open data Special Purpose and Application-Based Systems Websites Knowledge engineering Multimedia processing Focused crawling Document classification Ontologies Linked open data Convolutional neural networks Document analysis
ISSN:	1380-7501, 1573-7721
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Web-page indexing and classification have been studied extensively starting from the early WWW years. A smart intelligent web agent called focused crawler is a specific software able to seek web pages that are relevant to a particular topic domain. In this article we propose a novel approach to focused crawling based on the use of both textual and multimedia web page content. In our approach we define a novel strategy to choose if a web page should be further explored. We implement our framework in a system which aims to improve the crawling task using semantic based techniques and combining the results with novel technologies like convolutional neural networks and linked open data. Our framework uses ontologies to correlate different topics and understanding their relationships. The correlation among topics is used to improve a textual topic detection step. These results are combined with multimedia analysis and classification based on convolutional neural networks to extract image features. Experimental results are also presented and discussed in order to measure the effectiveness of our framework compared with other approaches using a ground truth composed of web pages about a specific domain.
Bibliographie:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-019-08252-2