NLP for computational insights into nutritional impacts on colorectal cancer care

Uložené v:
Podrobná bibliografia
Názov: NLP for computational insights into nutritional impacts on colorectal cancer care
Autori: Shengnan Gong, Xiaohong Jin, Yujie Guo, Jie Yu
Zdroj: SLAS Technology, Vol 32, Iss , Pp 100295- (2025)
Informácie o vydavateľovi: Elsevier, 2025.
Rok vydania: 2025
Zbierka: LCC:Biotechnology
LCC:Medical technology
Predmety: Natural language processing (NLP), Colorectal cancer (CRC), Nutritional impact CRC prediction framework (NICRP-framework), Dietary Patterns, Adaptive Tunicate Swarm Optimized Large Language Models (ATSO-LLM), Biotechnology, TP248.13-248.65, Medical technology, R855-855.5
Popis: Colorectal cancer (CRC) is one of the most prominent cancers globally, with its incidence rising among younger adults due to improved screening practices. However, existing algorithms for CRC prediction are frequently trained on datasets that primarily reflect older persons, thus limiting their usefulness in more diverse populations. Additionally, the part of nutrition in CRC deterrence and management is gaining significant attention, although computational approaches to analyzing the impact of diet on CRC remain underdeveloped. This research introduces the Nutritional Impact on CRC Prediction Framework (NICRP-Framework), which combines Natural Language Processing (NLP) techniques with Adaptive Tunicate Swarm Optimized Large Language Models (ATSO-LLMs) to present important insights into the part of the diet in CRC care across diverse populations. The colorectal cancer dietary and lifestyle dataset, encompassing >1000 participants, is collected from multiple regions and sources. The dataset includes structured and unstructured data, including textual descriptions of food ingredients. These descriptions are processed using standardization techniques, such as stop word removal, lowercasing, and punctuation elimination. Relevant terms are then extracted and visualized in a word cloud. The dataset also contained an imbalanced binary CRC outcome, which is rebalanced utilizing the random oversampling. ATSO-LLMs are employed to analyze the processed dietary data, identifying key nutritional factors and forecasting CRC and non-CRC phenotypes based on dietary patterns. The results show that combining NLP-derived features with ATSO-LLMs significantly enhances prediction accuracy (98.4 %), sensitivity (97.6 %) specificity (96.9 %) and F1-Score (96.2 %), with minimal misclassification rates. This framework represents a transformative advancement in life science by offering a new, data-driven approach to understanding the nutritional determinants of CRC, empowering healthcare professionals to make more precise predictions and adapted dietary interventions for diverse populations.
Druh dokumentu: article
Popis súboru: electronic resource
Jazyk: English
ISSN: 2472-6303
Relation: http://www.sciencedirect.com/science/article/pii/S2472630325000536; https://doaj.org/toc/2472-6303
DOI: 10.1016/j.slast.2025.100295
Prístupová URL adresa: https://doaj.org/article/9e8fc7e67544469fb50467d3df86ea85
Prístupové číslo: edsdoj.9e8fc7e67544469fb50467d3df86ea85
Databáza: Directory of Open Access Journals
Popis
Abstrakt:Colorectal cancer (CRC) is one of the most prominent cancers globally, with its incidence rising among younger adults due to improved screening practices. However, existing algorithms for CRC prediction are frequently trained on datasets that primarily reflect older persons, thus limiting their usefulness in more diverse populations. Additionally, the part of nutrition in CRC deterrence and management is gaining significant attention, although computational approaches to analyzing the impact of diet on CRC remain underdeveloped. This research introduces the Nutritional Impact on CRC Prediction Framework (NICRP-Framework), which combines Natural Language Processing (NLP) techniques with Adaptive Tunicate Swarm Optimized Large Language Models (ATSO-LLMs) to present important insights into the part of the diet in CRC care across diverse populations. The colorectal cancer dietary and lifestyle dataset, encompassing >1000 participants, is collected from multiple regions and sources. The dataset includes structured and unstructured data, including textual descriptions of food ingredients. These descriptions are processed using standardization techniques, such as stop word removal, lowercasing, and punctuation elimination. Relevant terms are then extracted and visualized in a word cloud. The dataset also contained an imbalanced binary CRC outcome, which is rebalanced utilizing the random oversampling. ATSO-LLMs are employed to analyze the processed dietary data, identifying key nutritional factors and forecasting CRC and non-CRC phenotypes based on dietary patterns. The results show that combining NLP-derived features with ATSO-LLMs significantly enhances prediction accuracy (98.4 %), sensitivity (97.6 %) specificity (96.9 %) and F1-Score (96.2 %), with minimal misclassification rates. This framework represents a transformative advancement in life science by offering a new, data-driven approach to understanding the nutritional determinants of CRC, empowering healthcare professionals to make more precise predictions and adapted dietary interventions for diverse populations.
ISSN:24726303
DOI:10.1016/j.slast.2025.100295