Query Rewriting for Extracting Data behind HTML Forms

Saved in:
Bibliographic Details
Title: Query Rewriting for Extracting Data behind HTML Forms
Authors: Chen, Xueqi
Source: Theses and Dissertations
Publisher Information: BYU ScholarsArchive
Publication Year: 2004
Collection: Brigham Young University (BYU): ScholarsArchive
Subject Terms: computer science, data extraction, HTML forms, Computer Sciences
Description: Much of the information on the Web is stored in specialized searchable databases and can only be accessed by interacting with a form or a series of forms. As a result, enabling automated agents and Web crawlers to interact with form-based interfaces designed primarily for humans is of great value. This thesis describes a system that can fill out Web forms automatically according to a given user query against a global schema for an application domain and, to the extent possible, extract just the relevant data behind these Web forms. Experimental results on two application domains show that the approach is reasonable for HTML forms.
Document Type: text
File Description: application/pdf
Language: unknown
Relation: https://scholarsarchive.byu.edu/etd/25; https://scholarsarchive.byu.edu/context/etd/article/1024/viewcontent/ETD_CISOPTR_124.pdf
Availability: https://scholarsarchive.byu.edu/etd/25
https://scholarsarchive.byu.edu/context/etd/article/1024/viewcontent/ETD_CISOPTR_124.pdf
Rights: http://lib.byu.edu/about/copyright/
Accession Number: edsbas.85153E29
Database: BASE
Description
Abstract:Much of the information on the Web is stored in specialized searchable databases and can only be accessed by interacting with a form or a series of forms. As a result, enabling automated agents and Web crawlers to interact with form-based interfaces designed primarily for humans is of great value. This thesis describes a system that can fill out Web forms automatically according to a given user query against a global schema for an application domain and, to the extent possible, extract just the relevant data behind these Web forms. Experimental results on two application domains show that the approach is reasonable for HTML forms.