Investigating the Capabilities of Recurrent Neural Networks for Solving the Problem of Classifying Poorly Structured Information on the Example of Bibliographic Data

— With the development of information technology, new fields of automatic data processing are becoming available, including bibliographic data. When information is collected from different sources and contains nonuniformly structured bibliographic records with formatting mistakes, transmitting the d...

Celý popis

Uložené v:
Podrobná bibliografia
Vydané v:Russian microelectronics Ročník 52; číslo 7; s. 711 - 715
Hlavní autori: Petrov, E. N., Portnov, E. M.
Médium: Journal Article
Jazyk:English
Vydavateľské údaje: Moscow Pleiades Publishing 01.12.2023
Springer Nature B.V
Predmet:
ISSN:1063-7397, 1608-3415
On-line prístup:Získať plný text
Tagy: Pridať tag
Žiadne tagy, Buďte prvý, kto otaguje tento záznam!
Popis
Shrnutí:— With the development of information technology, new fields of automatic data processing are becoming available, including bibliographic data. When information is collected from different sources and contains nonuniformly structured bibliographic records with formatting mistakes, transmitting the data to a summary table takes considerable time and effort and the result is subject to the influence of the human factor. Consequently, automatic bibliographic data processing is relevant and in demand. This paper investigates the capabilities of recurrent neural networks (RNSs) in relation to solving the problem of classifying poorly structured bibliographic information. It is shown that in order to use a RNS, it is necessary to change from the natural presentation of the bibliographic data collected to an indicative one, i.e., to present the data as a set of features. Selecting such a set of features is a separate complex problem. The developed RNS structure is implemented using the Python programming language. To evaluate the developed software module’s performance, a test set was formed from the publications list of the National Research University of Electronic Technology’s (MIET) Institute of Systems and Software Engineers and Information Technology, covering the past five years. An accuracy of 86%, which is 11% higher than the result obtained using a feed-forward neural network, is attained. The developed feature set and RNS structure allow automated bibliographic data processing, followed by the mandatory correction of the results by an operator.
Bibliografia:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1063-7397
1608-3415
DOI:10.1134/S1063739723070120