Semantic Encoding Algorithm for Classification and Retrieval of Aviation Safety Reports

Automated analysis of aviation safety reports is helpful in effectively preventing future accidents and improving emergency response capabilities. To date, there are no publicly available large-scale aviation text similarity datasets, which hinders the successful application of NLP techniques in the...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE transactions on automation science and engineering Ročník 22; s. 4643 - 4650
Hlavní autoři: Gao, Yubing, Zhu, GuangYu, Duan, Ya, Mao, Jianfeng
Médium: Journal Article
Jazyk:angličtina
Vydáno: IEEE 2025
Témata:
ISSN:1545-5955, 1558-3783
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Popis
Shrnutí:Automated analysis of aviation safety reports is helpful in effectively preventing future accidents and improving emergency response capabilities. To date, there are no publicly available large-scale aviation text similarity datasets, which hinders the successful application of NLP techniques in the aviation domain. We present an automatically created aviation text similarity dataset consisting of more than 500,000 pairs for fine-tuning pretrained language models. Since technical terms have specialized meanings that differ from everyday language, we propose an efficient semantic encoding algorithm to improve the ability of embeddings to adequately represent aviation terms. We provide new solutions and revised evaluation metrics for the classification and the retrieval of safety reports, confirming the reliability of our dataset and the superiority of our algorithm. Note to Practitioners-Text representation is an essential task in natural language processing(NLP). A crucial step towards the successful application of NLP in safety reports analysis is to ensure that aviation texts are adequately encoded. Aiming at the problem of poor ability of current embeddings to represent technical terms, we automatically create an aviation text similarity dataset and propose a semantic encoding algorithm for aviation terms. It is clear that the proposed method has great potential in representation of technical terms, thus providing assistance for downstream tasks such as text classification, information retrieval and question answering.
ISSN:1545-5955
1558-3783
DOI:10.1109/TASE.2024.3359356