PlantLncBoost: key features for plant lncRNA identification and significant improvement in accuracy and generalization

Summary Long noncoding RNAs (lncRNAs) are critical regulators of numerous biological processes in plants. Nevertheless, their identification is challenging due to the low sequence conservation across various species. Existing computational methods for lncRNA identification often face difficulties in...

Full description

Saved in:
Bibliographic Details
Published in:The New phytologist Vol. 247; no. 3; pp. 1538 - 1549
Main Authors: Tian, Xue‐Chan, Nie, Shuai, Domingues, Douglas, Rossi Paschoal, Alexandre, Jiang, Li‐Bo, Mao, Jian‐Feng
Format: Journal Article
Language:English
Published: England Wiley Subscription Services, Inc 01.08.2025
John Wiley and Sons Inc
Subjects:
ISSN:0028-646X, 1469-8137, 1469-8137
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Summary Long noncoding RNAs (lncRNAs) are critical regulators of numerous biological processes in plants. Nevertheless, their identification is challenging due to the low sequence conservation across various species. Existing computational methods for lncRNA identification often face difficulties in generalizing across diverse plant species, highlighting the need for more robust and versatile identification models. Here, we present PlantLncBoost, a novel computational tool designed to improve the generalization in plant lncRNA identification. By integrating advanced gradient boosting algorithms with comprehensive feature selection, our approach achieves both high accuracy and generalizability. We conducted an extensive analysis of 1662 features and identified three key features – ORF coverage, complex Fourier average, and atomic Fourier amplitude – that effectively distinguish lncRNAs from mRNAs. We assessed the performance of PlantLncBoost using comprehensive datasets from 20 plant species. The model exhibited exceptional performance, with an accuracy of 96.63%, a sensitivity of 98.42%, and a specificity of 94.93%, significantly outperforming existing tools. Further analysis revealed that the features we selected effectively capture the differences between lncRNAs and mRNAs across a variety of plant species. PlantLncBoost represents a significant advancement in plant lncRNA identification. It is freely accessible on GitHub (https://github.com/xuechantian/PlantLncBoost) and has been integrated into a comprehensive analysis pipeline, Plant‐LncRNA‐pipeline v.2 (https://github.com/xuechantian/Plant‐LncRNA‐pipeline‐v2).
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0028-646X
1469-8137
1469-8137
DOI:10.1111/nph.70211