Automated Taxonomy Construction Using Large Language Models: A Comparative Study of Fine-Tuning and Prompt Engineering.

Saved in:
Bibliographic Details
Title: Automated Taxonomy Construction Using Large Language Models: A Comparative Study of Fine-Tuning and Prompt Engineering.
Authors: Vu, Binh, Naik, Rashmi Govindraju, Nguyen, Bao Khanh, Mehraeen, Sina, Hemmje, Matthias
Source: Eng (2673-4117); Nov2025, Vol. 6 Issue 11, p283, 27p
Subject Terms: LANGUAGE models, NATURAL language processing, CLUSTERING algorithms, AUTOMATIC classification, INFORMATION retrieval
Company/Entity: EBAY Inc.
Abstract: Taxonomies provide essential hierarchical structures for classifying information, enabling effective retrieval and knowledge organization in diverse domains such as e-commerce, academic research, and web search. Traditional taxonomy construction, heavily reliant on manual curation by domain experts, faces significant challenges in scalability, cost, and consistency when dealing with the exponential growth of digital data. Recent advancements in Large Language Models (LLMs) and Natural Language Processing (NLP) present powerful opportunities for automating this complex process. This paper explores the potential of LLMs for automated taxonomy generation, focusing on methodologies incorporating semantic embedding generation, keyword extraction, and machine learning clustering algorithms. We specifically investigate and conduct a comparative analysis of two primary LLM-based approaches using a dataset of eBay product descriptions. The first approach involves fine-tuning a pre-trained LLM using structured hierarchical data derived from chain-of-layer clustering outputs. The second employs prompt-engineering techniques to guide LLMs in generating context-aware hierarchical taxonomies based on clustered keywords without explicit model retraining. Both methodologies are evaluated for their efficacy in constructing organized multi-level hierarchical taxonomies. Evaluation using semantic similarity metrics (BERTScore and Cosine Similarity) against a ground truth reveals that the fine-tuning approach yields higher overall accuracy and consistency (BERTScore F1: 70.91%; Cosine Similarity: 66.40%) compared to the prompt-engineering approach (BERTScore F1: 61.66%; Cosine Similarity: 60.34%). We delve into the inherent trade-offs between these methods concerning semantic fidelity, computational resource requirements, result stability, and scalability. Finally, we outline potential directions for future research aimed at refining LLM-based taxonomy construction systems to handle large dynamic datasets with enhanced accuracy, robustness, and granularity. [ABSTRACT FROM AUTHOR]
Copyright of Eng (2673-4117) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Complementary Index
Description
Abstract:Taxonomies provide essential hierarchical structures for classifying information, enabling effective retrieval and knowledge organization in diverse domains such as e-commerce, academic research, and web search. Traditional taxonomy construction, heavily reliant on manual curation by domain experts, faces significant challenges in scalability, cost, and consistency when dealing with the exponential growth of digital data. Recent advancements in Large Language Models (LLMs) and Natural Language Processing (NLP) present powerful opportunities for automating this complex process. This paper explores the potential of LLMs for automated taxonomy generation, focusing on methodologies incorporating semantic embedding generation, keyword extraction, and machine learning clustering algorithms. We specifically investigate and conduct a comparative analysis of two primary LLM-based approaches using a dataset of eBay product descriptions. The first approach involves fine-tuning a pre-trained LLM using structured hierarchical data derived from chain-of-layer clustering outputs. The second employs prompt-engineering techniques to guide LLMs in generating context-aware hierarchical taxonomies based on clustered keywords without explicit model retraining. Both methodologies are evaluated for their efficacy in constructing organized multi-level hierarchical taxonomies. Evaluation using semantic similarity metrics (BERTScore and Cosine Similarity) against a ground truth reveals that the fine-tuning approach yields higher overall accuracy and consistency (BERTScore F1: 70.91%; Cosine Similarity: 66.40%) compared to the prompt-engineering approach (BERTScore F1: 61.66%; Cosine Similarity: 60.34%). We delve into the inherent trade-offs between these methods concerning semantic fidelity, computational resource requirements, result stability, and scalability. Finally, we outline potential directions for future research aimed at refining LLM-based taxonomy construction systems to handle large dynamic datasets with enhanced accuracy, robustness, and granularity. [ABSTRACT FROM AUTHOR]
ISSN:26734117
DOI:10.3390/eng6110283