Applying genetic algorithms to query optimization in document retrieval
This paper proposes a novel approach to automatically retrieve keywords and then uses genetic algorithms to adapt the keyword weights. One of the contributions of the paper is to combine the Bigram (Chen, A., He, J., Xu, L., Gey, F. C., & Meggs, J. 1997. Chinese text retrieval without using a di...
Saved in:
| Published in: | Information processing & management Vol. 36; no. 5; pp. 737 - 759 |
|---|---|
| Main Authors: | , |
| Format: | Journal Article |
| Language: | English |
| Published: |
Oxford
Elsevier Ltd
01.09.2000
Elsevier Science Elsevier Science Ltd |
| Subjects: | |
| ISSN: | 0306-4573, 1873-5371 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | This paper proposes a novel approach to automatically retrieve keywords and then uses genetic algorithms to adapt the keyword weights. One of the contributions of the paper is to combine the Bigram (Chen, A., He, J., Xu, L., Gey, F. C., & Meggs, J. 1997.
Chinese text retrieval without using a dictionary, ACM SIGIR’97, Philadelphia, PA, USA, pp. 42–49; Yang, Y.-Y., Chang, J.-S., & Chen, K.-J. 1993),
Document automatic classification and ranking, Master thesis, Department of Computer Science, National Tsing Hua University) model and PAT-tree structure (Chien, L.-F., Huang, T.-I., & Chien, M.-C. 1997
Pat-tree-based keyword extraction for Chinese information retrieval, ACM SIGIR’97, Philadelphia, PA, US, pp. 50–59) to retrieve keywords. The approach extracts bigrams from documents and uses the bigrams to construct a PAT-tree to retrieve keywords. The proposed approach can retrieve any type of keywords such as technical keywords and a person’s name. Effectiveness of the proposed approach is demonstrated by comparing how effective are the keywords found by both this approach and the PAT-tree based approach. This comparison reveals that our keyword retrieval approach is as accurate as the PAT-tree based approach, yet our approach is faster and uses less memory. The study then applies genetic algorithms to tune the weight of retrieved keywords. Moreover, several documents obtained from web sites are tested and experimental results are compared with those of other approaches, indicating that the proposed approach is highly promising for applications. |
|---|---|
| Bibliography: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 0306-4573 1873-5371 |
| DOI: | 10.1016/S0306-4573(00)00008-X |