Kratt: Developing an Automatic Subject Indexing Tool for The National Library of Estonia

Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloguer's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intel...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org
Main Authors: Asula, Marit, Makke, Jane, Freienthal, Linda, Hele-Andra Kuulmets, Sirel, Raul
Format: Paper
Language:English
Published: Ithaca Cornell University Library, arXiv.org 24.03.2022
Subjects:
ISSN:2331-8422
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloguer's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately 1 minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the cataloguers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.
AbstractList Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloguer's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately 1 minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the cataloguers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques.
Author Hele-Andra Kuulmets
Sirel, Raul
Freienthal, Linda
Makke, Jane
Asula, Marit
Author_xml – sequence: 1
  givenname: Marit
  surname: Asula
  fullname: Asula, Marit
– sequence: 2
  givenname: Jane
  surname: Makke
  fullname: Makke, Jane
– sequence: 3
  givenname: Linda
  surname: Freienthal
  fullname: Freienthal, Linda
– sequence: 4
  fullname: Hele-Andra Kuulmets
– sequence: 5
  givenname: Raul
  surname: Sirel
  fullname: Sirel, Raul
BookMark eNotjrFOwzAURS0EEqX0A9gsMafYz3bisFWlQEUEAxnYqpfYhlTBLo5Tlb-nCKY7nKOje0FOffCWkCvO5lIrxW4wHrr9HICJOYey1CdkAkLwTEuAczIbhi1jDPIClBIT8vYUMaVbemf3tg-7zr9T9HQxpvCJqWvp69hsbZvo2ht7-KV1CD11IdL6w9LnoxM89rTqmojxmwZHV0MKvsNLcuawH-zsf6ekvl_Vy8esenlYLxdVhgqKzEgwKBuFkguhhDSghNXMcN4K4KiNYzYHw5A3TiLy0rnSltJIUzagtBJTcv2X3cXwNdohbbZhjMdLwwZyKThInRfiB3xAVJE
ContentType Paper
Copyright 2022. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2022. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID 8FE
8FG
ABJCF
ABUWG
AFKRA
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
L6V
M7S
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
DOI 10.48550/arxiv.2203.12998
DatabaseName ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central (Alumni Edition)
ProQuest Central UK/Ireland
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central Korea
SciTech Premium Collection
ProQuest Engineering Collection
Engineering Database
ProQuest One Academic
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
Engineering Collection
DatabaseTitle Publicly Available Content Database
Engineering Database
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Central Essentials
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Technology Collection
ProQuest SciTech Collection
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest One Academic UKI Edition
ProQuest Central Korea
Materials Science & Engineering Collection
ProQuest Central (New)
ProQuest One Academic
ProQuest One Academic (New)
Engineering Collection
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: PIMPY
  name: ProQuest Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Physics
EISSN 2331-8422
Genre Working Paper/Pre-Print
GroupedDBID 8FE
8FG
ABJCF
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
FRJ
HCIFZ
L6V
M7S
M~E
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PTHSS
ID FETCH-LOGICAL-a527-d42da4b5a4133534d253e80d11c321a8df0e62d0a1bf4aa19ff9e94d4d9b25853
IEDL.DBID BENPR
IngestDate Mon Jun 30 09:09:54 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a527-d42da4b5a4133534d253e80d11c321a8df0e62d0a1bf4aa19ff9e94d4d9b25853
Notes SourceType-Working Papers-1
ObjectType-Working Paper/Pre-Print-1
content type line 50
OpenAccessLink https://www.proquest.com/docview/2643124867?pq-origsite=%requestingapplication%
PQID 2643124867
PQPubID 2050157
ParticipantIDs proquest_journals_2643124867
PublicationCentury 2000
PublicationDate 20220324
PublicationDateYYYYMMDD 2022-03-24
PublicationDate_xml – month: 03
  year: 2022
  text: 20220324
  day: 24
PublicationDecade 2020
PublicationPlace Ithaca
PublicationPlace_xml – name: Ithaca
PublicationTitle arXiv.org
PublicationYear 2022
Publisher Cornell University Library, arXiv.org
Publisher_xml – name: Cornell University Library, arXiv.org
SSID ssj0002672553
Score 1.7891386
SecondaryResourceType preprint
Snippet Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloguer's knowledge...
SourceID proquest
SourceType Aggregation Database
SubjectTerms Artificial intelligence
Indexing
Libraries
Subject indexing
Title Kratt: Developing an Automatic Subject Indexing Tool for The National Library of Estonia
URI https://www.proquest.com/docview/2643124867
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELagBYmJt3iUygOr28R2EocF8WhFhagi6FCmyoltqVKVlKSt-Pmcg1sGJBZGy4t1tu_7fL77DqHrOLK6VAyYm4o8AozYECFNQBhwCWATcEqkqZtNRMOhGI_jxAXcKpdWufaJtaNWRWZj5F0AbgZYJMLodv5BbNco-7vqWmhso6ZVKuMN1LzvDZPXTZSFhhFwZvb9nVmLd3Vl-TlddSj1WAewLha_nHCNLP39_67pADUTOdflIdrS-RHarTM6s-oYjZ-tPvENftxURWGZ47vloqhFWjF4DBuCwQMrl2hnR0Uxw0BhMZwc7OSyZ9jVNeDC4J7liVN5gkb93ujhibg2CkQGNCKKUyV5GkiAKxYwrmjAtPCU72eM-lIo4-mQKk_6qeFS-rExsY654ipOKTwm2Clq5EWuzxDOqAmkzbAOhOaByuCtpI1JRciNzySl56i1ttPEXYVq8mOki7-nL9EetbUFHiOUt1BjUS71FdrJVotpVbbdzrZtcuYbjJLBS_L-BVOnsNc
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LTwIxEJ4gaPTkOz5Qe9Djwm7bZbsmxhiBQHiEAwc8kbJtExLCIi_1R_kfnS4LHky8efDcpMl0pjNfpzPfANyGgeWlYojcVOA6iIiNI6TxHYZYAtEEWok0ybCJoN0WvV7YycDnuhfGllWufWLiqFUc2Rx5EQM3w1gkSsHj5NWxU6Ps7-p6hMbKLBr64w2fbLOHehn1e0dptdJ9rjnpVAFH-jRwFKdK8oEv0Xszn3FFfaaFqzwvYtSTQhlXl6hypTcwXEovNCbUIVdchQOK2JrhtluQ4yiFyEKuU291XjZJHVoKEKKz1e9pwhVWlNP34bJAqcsKGFpD8cPnJ4Gsuv_PjuAARZcTPT2EjB4fwU5SrxrNjqHXsOzL96S86fkickyeFvM4oaAl6A9tgonULRmkXe3G8YggQCd4L0hKBj4iadcGiQ2pWBQ8lCfQ_QtZTiE7jsf6DEhEjS9t_bgvNPdVhC9BbcxAlLjxmKT0HPJrtfTTiz7rf-vk4vflG9itdVvNfrPeblzCHrVdFC5zKM9Ddj5d6CvYjpbz4Wx6nRoVgf4f6_ALhfkJ8Q
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Kratt%3A+Developing+an+Automatic+Subject+Indexing+Tool+for+The+National+Library+of+Estonia&rft.jtitle=arXiv.org&rft.au=Asula%2C+Marit&rft.au=Makke%2C+Jane&rft.au=Freienthal%2C+Linda&rft.au=Hele-Andra+Kuulmets&rft.date=2022-03-24&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2203.12998