Kratt: Developing an Automatic Subject Indexing Tool for The National Library of Estonia
Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloguer's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intel...
Saved in:
| Published in: | arXiv.org |
|---|---|
| Main Authors: | , , , , |
| Format: | Paper |
| Language: | English |
| Published: |
Ithaca
Cornell University Library, arXiv.org
24.03.2022
|
| Subjects: | |
| ISSN: | 2331-8422 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloguer's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately 1 minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the cataloguers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques. |
|---|---|
| AbstractList | Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloguer's knowledge on the specific topics contained in the book. Trying to solve these issues, we exploited the opportunities arising from artificial intelligence to develop Kratt: a prototype of an automatic subject indexing tool. Kratt is able to subject index a book independent of its extent and genre with a set of keywords present in the Estonian Subject Thesaurus. It takes Kratt approximately 1 minute to subject index a book, outperforming humans 10-15 times. Although the resulting keywords were not considered satisfactory by the cataloguers, the ratings of a small sample of regular library users showed more promise. We also argue that the results can be enhanced by including a bigger corpus for training the model and applying more careful preprocessing techniques. |
| Author | Hele-Andra Kuulmets Sirel, Raul Freienthal, Linda Makke, Jane Asula, Marit |
| Author_xml | – sequence: 1 givenname: Marit surname: Asula fullname: Asula, Marit – sequence: 2 givenname: Jane surname: Makke fullname: Makke, Jane – sequence: 3 givenname: Linda surname: Freienthal fullname: Freienthal, Linda – sequence: 4 fullname: Hele-Andra Kuulmets – sequence: 5 givenname: Raul surname: Sirel fullname: Sirel, Raul |
| BookMark | eNotjrFOwzAURS0EEqX0A9gsMafYz3bisFWlQEUEAxnYqpfYhlTBLo5Tlb-nCKY7nKOje0FOffCWkCvO5lIrxW4wHrr9HICJOYey1CdkAkLwTEuAczIbhi1jDPIClBIT8vYUMaVbemf3tg-7zr9T9HQxpvCJqWvp69hsbZvo2ht7-KV1CD11IdL6w9LnoxM89rTqmojxmwZHV0MKvsNLcuawH-zsf6ekvl_Vy8esenlYLxdVhgqKzEgwKBuFkguhhDSghNXMcN4K4KiNYzYHw5A3TiLy0rnSltJIUzagtBJTcv2X3cXwNdohbbZhjMdLwwZyKThInRfiB3xAVJE |
| ContentType | Paper |
| Copyright | 2022. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| Copyright_xml | – notice: 2022. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
| DBID | 8FE 8FG ABJCF ABUWG AFKRA AZQEC BENPR BGLVJ CCPQU DWQXO HCIFZ L6V M7S PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| DOI | 10.48550/arxiv.2203.12998 |
| DatabaseName | ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central (Alumni Edition) ProQuest Central UK/Ireland ProQuest Central Essentials ProQuest Central Technology Collection ProQuest One Community College ProQuest Central Korea SciTech Premium Collection ProQuest Engineering Collection Engineering Database ProQuest One Academic ProQuest One Academic (New) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic (retired) ProQuest One Academic UKI Edition ProQuest Central China Engineering Collection |
| DatabaseTitle | Publicly Available Content Database Engineering Database Technology Collection ProQuest One Academic Middle East (New) ProQuest Central Essentials ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Technology Collection ProQuest SciTech Collection ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest One Academic UKI Edition ProQuest Central Korea Materials Science & Engineering Collection ProQuest Central (New) ProQuest One Academic ProQuest One Academic (New) Engineering Collection |
| DatabaseTitleList | Publicly Available Content Database |
| Database_xml | – sequence: 1 dbid: PIMPY name: ProQuest Publicly Available Content Database url: http://search.proquest.com/publiccontent sourceTypes: Aggregation Database |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Physics |
| EISSN | 2331-8422 |
| Genre | Working Paper/Pre-Print |
| GroupedDBID | 8FE 8FG ABJCF ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS AZQEC BENPR BGLVJ CCPQU DWQXO FRJ HCIFZ L6V M7S M~E PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PTHSS |
| ID | FETCH-LOGICAL-a527-d42da4b5a4133534d253e80d11c321a8df0e62d0a1bf4aa19ff9e94d4d9b25853 |
| IEDL.DBID | BENPR |
| IngestDate | Mon Jun 30 09:09:54 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a527-d42da4b5a4133534d253e80d11c321a8df0e62d0a1bf4aa19ff9e94d4d9b25853 |
| Notes | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 |
| OpenAccessLink | https://www.proquest.com/docview/2643124867?pq-origsite=%requestingapplication% |
| PQID | 2643124867 |
| PQPubID | 2050157 |
| ParticipantIDs | proquest_journals_2643124867 |
| PublicationCentury | 2000 |
| PublicationDate | 20220324 |
| PublicationDateYYYYMMDD | 2022-03-24 |
| PublicationDate_xml | – month: 03 year: 2022 text: 20220324 day: 24 |
| PublicationDecade | 2020 |
| PublicationPlace | Ithaca |
| PublicationPlace_xml | – name: Ithaca |
| PublicationTitle | arXiv.org |
| PublicationYear | 2022 |
| Publisher | Cornell University Library, arXiv.org |
| Publisher_xml | – name: Cornell University Library, arXiv.org |
| SSID | ssj0002672553 |
| Score | 1.7891386 |
| SecondaryResourceType | preprint |
| Snippet | Manual subject indexing in libraries is a time-consuming and costly process and the quality of the assigned subjects is affected by the cataloguer's knowledge... |
| SourceID | proquest |
| SourceType | Aggregation Database |
| SubjectTerms | Artificial intelligence Indexing Libraries Subject indexing |
| Title | Kratt: Developing an Automatic Subject Indexing Tool for The National Library of Estonia |
| URI | https://www.proquest.com/docview/2643124867 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07T8MwELagBYmJt3iUygOr28R2EocF8WhFhagi6FCmyoltqVKVlKSt-Pmcg1sGJBZGy4t1tu_7fL77DqHrOLK6VAyYm4o8AozYECFNQBhwCWATcEqkqZtNRMOhGI_jxAXcKpdWufaJtaNWRWZj5F0AbgZYJMLodv5BbNco-7vqWmhso6ZVKuMN1LzvDZPXTZSFhhFwZvb9nVmLd3Vl-TlddSj1WAewLha_nHCNLP39_67pADUTOdflIdrS-RHarTM6s-oYjZ-tPvENftxURWGZ47vloqhFWjF4DBuCwQMrl2hnR0Uxw0BhMZwc7OSyZ9jVNeDC4J7liVN5gkb93ujhibg2CkQGNCKKUyV5GkiAKxYwrmjAtPCU72eM-lIo4-mQKk_6qeFS-rExsY654ipOKTwm2Clq5EWuzxDOqAmkzbAOhOaByuCtpI1JRciNzySl56i1ttPEXYVq8mOki7-nL9EetbUFHiOUt1BjUS71FdrJVotpVbbdzrZtcuYbjJLBS_L-BVOnsNc |
| linkProvider | ProQuest |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1LTwIxEJ4gaPTkOz5Qe9Djwm7bZbsmxhiBQHiEAwc8kbJtExLCIi_1R_kfnS4LHky8efDcpMl0pjNfpzPfANyGgeWlYojcVOA6iIiNI6TxHYZYAtEEWok0ybCJoN0WvV7YycDnuhfGllWufWLiqFUc2Rx5EQM3w1gkSsHj5NWxU6Ps7-p6hMbKLBr64w2fbLOHehn1e0dptdJ9rjnpVAFH-jRwFKdK8oEv0Xszn3FFfaaFqzwvYtSTQhlXl6hypTcwXEovNCbUIVdchQOK2JrhtluQ4yiFyEKuU291XjZJHVoKEKKz1e9pwhVWlNP34bJAqcsKGFpD8cPnJ4Gsuv_PjuAARZcTPT2EjB4fwU5SrxrNjqHXsOzL96S86fkickyeFvM4oaAl6A9tgonULRmkXe3G8YggQCd4L0hKBj4iadcGiQ2pWBQ8lCfQ_QtZTiE7jsf6DEhEjS9t_bgvNPdVhC9BbcxAlLjxmKT0HPJrtfTTiz7rf-vk4vflG9itdVvNfrPeblzCHrVdFC5zKM9Ddj5d6CvYjpbz4Wx6nRoVgf4f6_ALhfkJ8Q |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Kratt%3A+Developing+an+Automatic+Subject+Indexing+Tool+for+The+National+Library+of+Estonia&rft.jtitle=arXiv.org&rft.au=Asula%2C+Marit&rft.au=Makke%2C+Jane&rft.au=Freienthal%2C+Linda&rft.au=Hele-Andra+Kuulmets&rft.date=2022-03-24&rft.pub=Cornell+University+Library%2C+arXiv.org&rft.eissn=2331-8422&rft_id=info:doi/10.48550%2Farxiv.2203.12998 |