Named Entity Recognition Algorithms Comparison For Judicial Text Data

The more developed the society, the higher the role of legal relations. Accordingly, the number of court appeals is growing rapidly both from individuals and legal entities. Therefore, in any society the following tasks become extremely important.1)Reducing the time spent for the legal process, incl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International Conference on Application of Information and Communication Technologies S. 1 - 5
Hauptverfasser: Aibek, Kuralbayev, Bobur, Mukhsimbayev, Abay, Bekbaganbetov, Hajiyev, Fuad
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 07.10.2020
Schlagworte:
ISSN:2472-8586
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The more developed the society, the higher the role of legal relations. Accordingly, the number of court appeals is growing rapidly both from individuals and legal entities. Therefore, in any society the following tasks become extremely important.1)Reducing the time spent for the legal process, including reducing "errors" at various levels.2)Reducing the number of appeals to the courts and increasing the role of mediation.The authors have developed a prototype of the "Smart Judge Assistant", SJA, recommender system, which largely solves both tasks. The prototype of the recommender system has already successfully passed the first stage of testing by the Supreme Court of the Republic of Kazakhstan.When developing the prototype, the authors faced various problems related to text recognition. One of them is the problem of data publicity.Objective of the article: One of the main tasks in making documents public, is to hide personal data of parties. In this article we compare several Named Entity Recognition (NER) models to extract personal information from judicial acts (in russian and kazakh languages), such as a person name, an organization name and a location name.Methodology: Four types of algorithms were chosen for training NER models: CRF (Conditional Random Fields), LSTM (Long Short Term Memory) with character embeddings, LSTM-CRF and BERT (Bidirectional Encoder Representations from Transformers).Findings: Models trained by all four algorithms have reasonably high accuracy because of the almost alike structure of source documents in judicial dataset. BERT algorithm shows the best performance out of four algorithms (F1 score: 0.87).
ISSN:2472-8586
DOI:10.1109/AICT50176.2020.9368843