Application of Random Forest in Limited Size Human Long Non-coding RNAs Identification with Secondary Structure Features

In this work, preliminary experiments of using diverse machine learning algorithms and testing of multiple relevant features to discriminate between human lncRNAs and coding/partial coding sequences was performed. This research limited the size of human lncRNAs such that they are shorter than 1000 n...

Full description

Saved in:
Bibliographic Details
Published in:2019 23rd International Computer Science and Engineering Conference (ICSEC) pp. 65 - 69
Main Authors: Anuntakarun, Songtham, Wattanapornprom, Warin, Lertampaiporn, Supatcha
Format: Conference Proceeding
Language:English
Published: IEEE 01.10.2019
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract In this work, preliminary experiments of using diverse machine learning algorithms and testing of multiple relevant features to discriminate between human lncRNAs and coding/partial coding sequences was performed. This research limited the size of human lncRNAs such that they are shorter than 1000 nucleotides. Various significant features in describing RNA sequence including sequenced based features, secondary structure features, base-pair features and structural robustness features were used in this study. Then, the top 20 significant features were selected using Wilcoxon rank-sum test and discovered that the secondary structure features are the unique characteristics for identifying the human lncRNAs which are quite difference with those in the groups of shorter and longer types of ncRNAs. Such features are suitable with the rule-based classifiers like Random Forest. According to 10-folding cross validation, the random forest model has shown the highest accuracy, sensitivity and specificity as well as the lowest false positive rate among all competitors. Furthermore, the model was compared with other state-of -the-art approaches such as CPC, CPAT, RNAcon and achieved the highest accuracy of 84.5% among all the participants.
AbstractList In this work, preliminary experiments of using diverse machine learning algorithms and testing of multiple relevant features to discriminate between human lncRNAs and coding/partial coding sequences was performed. This research limited the size of human lncRNAs such that they are shorter than 1000 nucleotides. Various significant features in describing RNA sequence including sequenced based features, secondary structure features, base-pair features and structural robustness features were used in this study. Then, the top 20 significant features were selected using Wilcoxon rank-sum test and discovered that the secondary structure features are the unique characteristics for identifying the human lncRNAs which are quite difference with those in the groups of shorter and longer types of ncRNAs. Such features are suitable with the rule-based classifiers like Random Forest. According to 10-folding cross validation, the random forest model has shown the highest accuracy, sensitivity and specificity as well as the lowest false positive rate among all competitors. Furthermore, the model was compared with other state-of -the-art approaches such as CPC, CPAT, RNAcon and achieved the highest accuracy of 84.5% among all the participants.
Author Wattanapornprom, Warin
Anuntakarun, Songtham
Lertampaiporn, Supatcha
Author_xml – sequence: 1
  givenname: Songtham
  surname: Anuntakarun
  fullname: Anuntakarun, Songtham
  organization: King Mongkut's University of Technology Thonburi,Bioinformatics and Systems Biology Program,Bangkok,Thailand
– sequence: 2
  givenname: Warin
  surname: Wattanapornprom
  fullname: Wattanapornprom, Warin
  organization: University of Technology Thonburi,Department of Mathematics,Bangkok,Thailand
– sequence: 3
  givenname: Supatcha
  surname: Lertampaiporn
  fullname: Lertampaiporn, Supatcha
  organization: King Mongkut's University of Technology Thonburi,Biochemical Engineering and Systems Biology Research Group National Center for Genetic Engineering and Biotechnology (BIOTEC),Bangkok,Thailand
BookMark eNo1kMtOwzAURI0EC1r4Ahb4B1LsxInjZRS1NFJUpAbWlR_XYKmxo8QVj68nFWV1RrM4Gs0CXfvgAaFHSlaUEvHU1N26ZpzSdJUSKlal4IwzcYUWlKclTXPGylv0VQ3D0WkZXfA4WLyX3oQeb8IIU8TO49b1LoLBnfsBvD31cq6Cf8e74BMdjJvjfldNuDHgo7P_qk8XP3AHOngjx2_cxfGk42kEvAF55nSHbqw8TnB_4RK9bdav9TZpX56bumoTR2kZEy01TYVimSyMtbJQliilFeeGmDy3zHBiCCeC2VxrZQg1KRScgJKSi6xk2RI9_HkdAByG0fXznsPljOwXPC1cwg
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICSEC47112.2019.8974749
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library (IEL) (UW System Shared)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1728125448
9781728125442
EndPage 69
ExternalDocumentID 8974749
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i118t-cac129b43a6dffa6bf0bbcb77d0d55f4d70d07094f5ccbd01d2e670ebaa793843
IEDL.DBID RIE
IngestDate Wed Aug 27 07:39:48 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i118t-cac129b43a6dffa6bf0bbcb77d0d55f4d70d07094f5ccbd01d2e670ebaa793843
PageCount 5
ParticipantIDs ieee_primary_8974749
PublicationCentury 2000
PublicationDate 2019-Oct.
PublicationDateYYYYMMDD 2019-10-01
PublicationDate_xml – month: 10
  year: 2019
  text: 2019-Oct.
PublicationDecade 2010
PublicationTitle 2019 23rd International Computer Science and Engineering Conference (ICSEC)
PublicationTitleAbbrev ICSEC
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.7009844
Snippet In this work, preliminary experiments of using diverse machine learning algorithms and testing of multiple relevant features to discriminate between human...
SourceID ieee
SourceType Publisher
StartPage 65
SubjectTerms Accuracy
Bioinformatics
Machine learning
Machine learning algorithms
Nearest neighbor methods
Prediction methods
Random forests
RNA
Robustness
Sensitivity and specificity
Shape
Testing
Title Application of Random Forest in Limited Size Human Long Non-coding RNAs Identification with Secondary Structure Features
URI https://ieeexplore.ieee.org/document/8974749
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA9zePCksonfvINH49o1bZLjGBsKUsaqsNvIp_RgK-sm6l9v0tYNwYunhBBeyOf7yPu9h9CNdlKEh2DioYw5JowYLE1s8DCMBKNORI-srZNN0DRliwWfddDtFgtjjKmdz8ydr9Z_-bpUG28qGzAv_BK-h_YoTRqsVuuyFQZ88DDOJmP31oYeXxW6I9D0_pU2peYa08P_jXeE-jv4Hcy2jOUYdUzRQx-j3VczlBbmotDlK_jUmtUa8gJaqBJk-ZeB2jYPj2XxAmlZYFV6SjBPRxU00Fz7Q8obYiHzarEWq0_I6niym5UBLxy6suqj5-nkaXyP27QJOHfawhoroRwTlyQSibZWJNIGUipJqQ50HFuiaaDdRefExkpJHYR6aBIaGCmEu6yMRCeoW5SFOUVABHPqomLcUSFWhNKHD1QRD5gWnBlxhnp-1ZZvTWSMZbtg5383X6ADvzGNK9wl6roJmSu0r97XebW6rrfzG1gDpRk
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA9zCnpS2cRvc_BoXNqma3IcY8PhLGOdsNvIp_RgK-sm6l9v0tYNwYunhBBeyOf7yPu9B8CtslKEg2AiX4QMEUo0EjrUyPcCTiMrogfGlMkmojim8zmbNMDdBgujtS6dz_S9q5Z_-SqXa2cq61An_BK2A3ZDQnxcobVqpy0Ps86onwz69rX1HMLKs4eg6v8rcUrJN4aH_xvxCLS3ADw42bCWY9DQWQt89LafzTA3cMozlb9Cl1yzWME0gzVYCSbpl4aldR6O8-wFxnmGZO4owWncK2AFzjU_pJwpFiZOMVZ8-QmTMqLseqmhEw9tWbTB83Aw6z-gOnECSq2-sEKSS8vGBQl4VxnDu8JgIaSIIoVVGBqiIqzsVWfEhFIKhT3l626EteDcXldKghPQzPJMnwJIOLUKo6TMUiGGe8IFEJQBw1RxRjU_Ay23aou3KjbGol6w87-bb8D-w-xpvBiP4scLcOA2qXKMuwRNOzl9Bfbk-yotltfl1n4DFyWoYA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2019+23rd+International+Computer+Science+and+Engineering+Conference+%28ICSEC%29&rft.atitle=Application+of+Random+Forest+in+Limited+Size+Human+Long+Non-coding+RNAs+Identification+with+Secondary+Structure+Features&rft.au=Anuntakarun%2C+Songtham&rft.au=Wattanapornprom%2C+Warin&rft.au=Lertampaiporn%2C+Supatcha&rft.date=2019-10-01&rft.pub=IEEE&rft.spage=65&rft.epage=69&rft_id=info:doi/10.1109%2FICSEC47112.2019.8974749&rft.externalDocID=8974749