Víceslovné lexikální entity v korpusu i lexikonu

Saved in:
Bibliographic Details
Title: Víceslovné lexikální entity v korpusu i lexikonu
Authors: Kopřivová, Marie, Skoumalová, Hana, Goláňová, Hana, Hnátková, Milena, Christou, Anna, Jelínek, Tomáš, Křivan, Jan, Petkevič, Vladimír, Tvrdá, Karla, Vítovec, Přemysl, Vondřička, Pavel
Source: Časopis pro moderní filologii 2025(2),177-188 (2025)
Publisher Information: Charles University in Prague, Karolinum Press, 2025.
Publication Year: 2025
Subject Terms: korpusy češtiny, víceslovné lexikální jednotky, databáze LEMUR, Akademický slovník současné češtiny, frazémy
Description: The aim of the Multiword Units (MWUs) for Digital Education Project is to extend the LEMUR lexicographic database and its software application for corpus annotation, linking the database with language corpora and with the Academic Dictionary of Contemporary Czech. In addition to a significant expansion of the number of MWUs in the database, the focus is on creating a new annotation program that will be able to capture fragments of MWUs and their combinations in Czech texts. As new units are added to the database, modifications and corrections of existing entries are also being made, both in the classification of individual collocations (e.g., proverbs are being revised) and in the database structure (new categories are being added). The first version of the program that will search the corpus and tag MWUs has already been created, replacing the existing FRANTA annotation program. The new description, taking into account several orthogonal axes, will assign a tag to each MWU and its components, providing the user with richer information. In the test run, it will be possible to toggle from corpus to database and from dictionary to database. A didactic manual is also planned, to teach students how to retrieve information about MWUs and work with lexico graphic descriptions.
Document Type: Article
File Description: application/pdf
Language: Czech
ISSN: 2336-6591
DOI: 10.14712/23366591.2025.2.4
Access URL: http://hdl.handle.net/20.500.11956/199901
Accession Number: edsair.doi.dedup.....9e6fc7d9364ea8e92f6e56dc06ec13c3
Database: OpenAIRE
Description
Abstract:The aim of the Multiword Units (MWUs) for Digital Education Project is to extend the LEMUR lexicographic database and its software application for corpus annotation, linking the database with language corpora and with the Academic Dictionary of Contemporary Czech. In addition to a significant expansion of the number of MWUs in the database, the focus is on creating a new annotation program that will be able to capture fragments of MWUs and their combinations in Czech texts. As new units are added to the database, modifications and corrections of existing entries are also being made, both in the classification of individual collocations (e.g., proverbs are being revised) and in the database structure (new categories are being added). The first version of the program that will search the corpus and tag MWUs has already been created, replacing the existing FRANTA annotation program. The new description, taking into account several orthogonal axes, will assign a tag to each MWU and its components, providing the user with richer information. In the test run, it will be possible to toggle from corpus to database and from dictionary to database. A didactic manual is also planned, to teach students how to retrieve information about MWUs and work with lexico graphic descriptions.
ISSN:23366591
DOI:10.14712/23366591.2025.2.4