Reinforced Multilingual Geospatial Query Extraction Using Naive Bayes and Fuzzy Matching
The paper introduces a method for automatic canonicalization of geospatial entities from an unstructured language input given by the user. The traditional and old method of Geospatial canonical search is outdated and has loopholes for the handling of the multilingual input and most importantly the a...
Uloženo v:
| Vydáno v: | 2024 IEEE 1st International Conference on Green Industrial Electronics and Sustainable Technologies (GIEST) s. 1 - 5 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
25.10.2024
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | The paper introduces a method for automatic canonicalization of geospatial entities from an unstructured language input given by the user. The traditional and old method of Geospatial canonical search is outdated and has loopholes for the handling of the multilingual input and most importantly the accuracy of the model. The system is designed to handle Textual input and Voice Modulated input. To detect and correct the non- standardized geospatial entities, which includes those with typographical errors or unconventional spellings, we use the advanced natural language processing (NLP) technique for standardizing the entities. By using the custom Naive Bayes probability algorithm, the Model identifies potential geospatial entities by recognizing the incorrectly spelled names and matching them with their correct form using the naive bayes probability algorithm. Furthermore, to enhance the identification of entities accuracy we use the system which has some per-defined list of abbreviations for geographical features like (mountains, lakes, oceans etc.) In case the where the same place name exists in multiple locations in same or different country (like Punjab in both India and Pakistan), then the Model provides the geographical output data for all the relevant locations with their exact longitude and latitude coordinates. After this the model triggers the fuzzy matching algorithm which uses the token ratio technique to ensure the precise mapping of corrected names to their canonical forms in the dataset. |
|---|---|
| DOI: | 10.1109/GIEST62955.2024.10959803 |