Application of Random Forest in Limited Size Human Long Non-coding RNAs Identification with Secondary Structure Features

In this work, preliminary experiments of using diverse machine learning algorithms and testing of multiple relevant features to discriminate between human lncRNAs and coding/partial coding sequences was performed. This research limited the size of human lncRNAs such that they are shorter than 1000 n...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2019 23rd International Computer Science and Engineering Conference (ICSEC) S. 65 - 69
Hauptverfasser: Anuntakarun, Songtham, Wattanapornprom, Warin, Lertampaiporn, Supatcha
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.10.2019
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract In this work, preliminary experiments of using diverse machine learning algorithms and testing of multiple relevant features to discriminate between human lncRNAs and coding/partial coding sequences was performed. This research limited the size of human lncRNAs such that they are shorter than 1000 nucleotides. Various significant features in describing RNA sequence including sequenced based features, secondary structure features, base-pair features and structural robustness features were used in this study. Then, the top 20 significant features were selected using Wilcoxon rank-sum test and discovered that the secondary structure features are the unique characteristics for identifying the human lncRNAs which are quite difference with those in the groups of shorter and longer types of ncRNAs. Such features are suitable with the rule-based classifiers like Random Forest. According to 10-folding cross validation, the random forest model has shown the highest accuracy, sensitivity and specificity as well as the lowest false positive rate among all competitors. Furthermore, the model was compared with other state-of -the-art approaches such as CPC, CPAT, RNAcon and achieved the highest accuracy of 84.5% among all the participants.
AbstractList In this work, preliminary experiments of using diverse machine learning algorithms and testing of multiple relevant features to discriminate between human lncRNAs and coding/partial coding sequences was performed. This research limited the size of human lncRNAs such that they are shorter than 1000 nucleotides. Various significant features in describing RNA sequence including sequenced based features, secondary structure features, base-pair features and structural robustness features were used in this study. Then, the top 20 significant features were selected using Wilcoxon rank-sum test and discovered that the secondary structure features are the unique characteristics for identifying the human lncRNAs which are quite difference with those in the groups of shorter and longer types of ncRNAs. Such features are suitable with the rule-based classifiers like Random Forest. According to 10-folding cross validation, the random forest model has shown the highest accuracy, sensitivity and specificity as well as the lowest false positive rate among all competitors. Furthermore, the model was compared with other state-of -the-art approaches such as CPC, CPAT, RNAcon and achieved the highest accuracy of 84.5% among all the participants.
Author Wattanapornprom, Warin
Anuntakarun, Songtham
Lertampaiporn, Supatcha
Author_xml – sequence: 1
  givenname: Songtham
  surname: Anuntakarun
  fullname: Anuntakarun, Songtham
  organization: King Mongkut's University of Technology Thonburi,Bioinformatics and Systems Biology Program,Bangkok,Thailand
– sequence: 2
  givenname: Warin
  surname: Wattanapornprom
  fullname: Wattanapornprom, Warin
  organization: University of Technology Thonburi,Department of Mathematics,Bangkok,Thailand
– sequence: 3
  givenname: Supatcha
  surname: Lertampaiporn
  fullname: Lertampaiporn, Supatcha
  organization: King Mongkut's University of Technology Thonburi,Biochemical Engineering and Systems Biology Research Group National Center for Genetic Engineering and Biotechnology (BIOTEC),Bangkok,Thailand
BookMark eNo1kMtOwzAURI0EC1r4Ahb4B1LsxInjZRS1NFJUpAbWlR_XYKmxo8QVj68nFWV1RrM4Gs0CXfvgAaFHSlaUEvHU1N26ZpzSdJUSKlal4IwzcYUWlKclTXPGylv0VQ3D0WkZXfA4WLyX3oQeb8IIU8TO49b1LoLBnfsBvD31cq6Cf8e74BMdjJvjfldNuDHgo7P_qk8XP3AHOngjx2_cxfGk42kEvAF55nSHbqw8TnB_4RK9bdav9TZpX56bumoTR2kZEy01TYVimSyMtbJQliilFeeGmDy3zHBiCCeC2VxrZQg1KRScgJKSi6xk2RI9_HkdAByG0fXznsPljOwXPC1cwg
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICSEC47112.2019.8974749
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1728125448
9781728125442
EndPage 69
ExternalDocumentID 8974749
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i118t-cac129b43a6dffa6bf0bbcb77d0d55f4d70d07094f5ccbd01d2e670ebaa793843
IEDL.DBID RIE
IngestDate Wed Aug 27 07:39:48 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i118t-cac129b43a6dffa6bf0bbcb77d0d55f4d70d07094f5ccbd01d2e670ebaa793843
PageCount 5
ParticipantIDs ieee_primary_8974749
PublicationCentury 2000
PublicationDate 2019-Oct.
PublicationDateYYYYMMDD 2019-10-01
PublicationDate_xml – month: 10
  year: 2019
  text: 2019-Oct.
PublicationDecade 2010
PublicationTitle 2019 23rd International Computer Science and Engineering Conference (ICSEC)
PublicationTitleAbbrev ICSEC
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.7009844
Snippet In this work, preliminary experiments of using diverse machine learning algorithms and testing of multiple relevant features to discriminate between human...
SourceID ieee
SourceType Publisher
StartPage 65
SubjectTerms Accuracy
Bioinformatics
Machine learning
Machine learning algorithms
Nearest neighbor methods
Prediction methods
Random forests
RNA
Robustness
Sensitivity and specificity
Shape
Testing
Title Application of Random Forest in Limited Size Human Long Non-coding RNAs Identification with Secondary Structure Features
URI https://ieeexplore.ieee.org/document/8974749
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA_b8OBJZRO_eQePxqU2TZrjGBsKUsaqsNvIV6UHW1k3Uf96k7ZuCF48JYSQkJe8vOS993sPoWtJM6UZsVhozTDlRGIVxwEWPGKS0cg7vNfJJniSxIuFmHXQzRYLY62tnc_sra_WtnxT6o1XlQ1j__iloou6nLMGq9W6bAVEDB_G6WTs7trA46sCdwSa3r_SptRSY3rwv_kO0WAHv4PZVrAcoY4t-uhjtDM1Q5nBXBamfAWfWrNaQ15AC1WCNP-yUOvm4bEsXiApC6xLPxLMk1EFDTQ3-xnKK2Ih9d9iI1efkNbxZDcrC_5x6MpqgJ6nk6fxPW7TJuDc_RbWWEvthLiioWQmyyRTGVFKK84NMVGUUcOJcYwuaBZprQwJzJ1lnFglpWPWmIbHqFeUhT1BYCznXDMZMkocZY2IKKVSsVCKWHJOTlHfU2351kTGWLYEO_u7-Rzt-41pXOEuUM8tyF6iPf2-zqvVVb2d36iMpLk
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA9zCnpS2cRvc_BoXLqmSXMcY8PhLGOdsNvIV6UHW1k3Uf96k7ZuCF48JYSQkJe8vOS993sPgFtBEqkoNogrRRFhWCAZhh7iLKCCksA5vJfJJlgUhfM5nzTA3QYLY4wpnc_MvauWtnydq7VTlXVC9_glfAfsBoR0cYXWqp22PMw7o3486Nvb1nMIK88egqr_r8QppdwYHv5vxiPQ3gLw4GQjWo5Bw2Qt8NHbGpthnsCpyHT-Cl1yzWIF0wzWYCUYp18Gltp5OM6zFxjlGVK5GwlOo14BK3Bu8jOUU8XC2H2MtVh-wriMKLteGuieh7Ys2uB5OJj1H1CdOAGl9r-wQkooK8Yl8QXVSSKoTLCUSjKmsQ6ChGiGtWV1TpJAKamxp7uGMmykEJZdQ-KfgGaWZ-YUQG0YY4oKnxJsKau5pTwRkvqCh4IxfAZajmqLtyo2xqIm2PnfzTdg_2H2NF6MR9HjBThwm1Q5xl2Cpl2cuQJ76n2VFsvrcmu_AWOUqAA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2019+23rd+International+Computer+Science+and+Engineering+Conference+%28ICSEC%29&rft.atitle=Application+of+Random+Forest+in+Limited+Size+Human+Long+Non-coding+RNAs+Identification+with+Secondary+Structure+Features&rft.au=Anuntakarun%2C+Songtham&rft.au=Wattanapornprom%2C+Warin&rft.au=Lertampaiporn%2C+Supatcha&rft.date=2019-10-01&rft.pub=IEEE&rft.spage=65&rft.epage=69&rft_id=info:doi/10.1109%2FICSEC47112.2019.8974749&rft.externalDocID=8974749