Application of Random Forest in Limited Size Human Long Non-coding RNAs Identification with Secondary Structure Features
In this work, preliminary experiments of using diverse machine learning algorithms and testing of multiple relevant features to discriminate between human lncRNAs and coding/partial coding sequences was performed. This research limited the size of human lncRNAs such that they are shorter than 1000 n...
Saved in:
| Published in: | 2019 23rd International Computer Science and Engineering Conference (ICSEC) pp. 65 - 69 |
|---|---|
| Main Authors: | , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.10.2019
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | In this work, preliminary experiments of using diverse machine learning algorithms and testing of multiple relevant features to discriminate between human lncRNAs and coding/partial coding sequences was performed. This research limited the size of human lncRNAs such that they are shorter than 1000 nucleotides. Various significant features in describing RNA sequence including sequenced based features, secondary structure features, base-pair features and structural robustness features were used in this study. Then, the top 20 significant features were selected using Wilcoxon rank-sum test and discovered that the secondary structure features are the unique characteristics for identifying the human lncRNAs which are quite difference with those in the groups of shorter and longer types of ncRNAs. Such features are suitable with the rule-based classifiers like Random Forest. According to 10-folding cross validation, the random forest model has shown the highest accuracy, sensitivity and specificity as well as the lowest false positive rate among all competitors. Furthermore, the model was compared with other state-of -the-art approaches such as CPC, CPAT, RNAcon and achieved the highest accuracy of 84.5% among all the participants. |
|---|---|
| AbstractList | In this work, preliminary experiments of using diverse machine learning algorithms and testing of multiple relevant features to discriminate between human lncRNAs and coding/partial coding sequences was performed. This research limited the size of human lncRNAs such that they are shorter than 1000 nucleotides. Various significant features in describing RNA sequence including sequenced based features, secondary structure features, base-pair features and structural robustness features were used in this study. Then, the top 20 significant features were selected using Wilcoxon rank-sum test and discovered that the secondary structure features are the unique characteristics for identifying the human lncRNAs which are quite difference with those in the groups of shorter and longer types of ncRNAs. Such features are suitable with the rule-based classifiers like Random Forest. According to 10-folding cross validation, the random forest model has shown the highest accuracy, sensitivity and specificity as well as the lowest false positive rate among all competitors. Furthermore, the model was compared with other state-of -the-art approaches such as CPC, CPAT, RNAcon and achieved the highest accuracy of 84.5% among all the participants. |
| Author | Wattanapornprom, Warin Anuntakarun, Songtham Lertampaiporn, Supatcha |
| Author_xml | – sequence: 1 givenname: Songtham surname: Anuntakarun fullname: Anuntakarun, Songtham organization: King Mongkut's University of Technology Thonburi,Bioinformatics and Systems Biology Program,Bangkok,Thailand – sequence: 2 givenname: Warin surname: Wattanapornprom fullname: Wattanapornprom, Warin organization: University of Technology Thonburi,Department of Mathematics,Bangkok,Thailand – sequence: 3 givenname: Supatcha surname: Lertampaiporn fullname: Lertampaiporn, Supatcha organization: King Mongkut's University of Technology Thonburi,Biochemical Engineering and Systems Biology Research Group National Center for Genetic Engineering and Biotechnology (BIOTEC),Bangkok,Thailand |
| BookMark | eNo1kMtOwzAURI0EC1r4Ahb4B1LsxInjZRS1NFJUpAbWlR_XYKmxo8QVj68nFWV1RrM4Gs0CXfvgAaFHSlaUEvHU1N26ZpzSdJUSKlal4IwzcYUWlKclTXPGylv0VQ3D0WkZXfA4WLyX3oQeb8IIU8TO49b1LoLBnfsBvD31cq6Cf8e74BMdjJvjfldNuDHgo7P_qk8XP3AHOngjx2_cxfGk42kEvAF55nSHbqw8TnB_4RK9bdav9TZpX56bumoTR2kZEy01TYVimSyMtbJQliilFeeGmDy3zHBiCCeC2VxrZQg1KRScgJKSi6xk2RI9_HkdAByG0fXznsPljOwXPC1cwg |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ICSEC47112.2019.8974749 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library (IEL) (UW System Shared) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| EISBN | 1728125448 9781728125442 |
| EndPage | 69 |
| ExternalDocumentID | 8974749 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IL CBEJK RIE RIL |
| ID | FETCH-LOGICAL-i118t-cac129b43a6dffa6bf0bbcb77d0d55f4d70d07094f5ccbd01d2e670ebaa793843 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 27 07:39:48 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i118t-cac129b43a6dffa6bf0bbcb77d0d55f4d70d07094f5ccbd01d2e670ebaa793843 |
| PageCount | 5 |
| ParticipantIDs | ieee_primary_8974749 |
| PublicationCentury | 2000 |
| PublicationDate | 2019-Oct. |
| PublicationDateYYYYMMDD | 2019-10-01 |
| PublicationDate_xml | – month: 10 year: 2019 text: 2019-Oct. |
| PublicationDecade | 2010 |
| PublicationTitle | 2019 23rd International Computer Science and Engineering Conference (ICSEC) |
| PublicationTitleAbbrev | ICSEC |
| PublicationYear | 2019 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| Score | 1.7009844 |
| Snippet | In this work, preliminary experiments of using diverse machine learning algorithms and testing of multiple relevant features to discriminate between human... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 65 |
| SubjectTerms | Accuracy Bioinformatics Machine learning Machine learning algorithms Nearest neighbor methods Prediction methods Random forests RNA Robustness Sensitivity and specificity Shape Testing |
| Title | Application of Random Forest in Limited Size Human Long Non-coding RNAs Identification with Secondary Structure Features |
| URI | https://ieeexplore.ieee.org/document/8974749 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA9zePCksonfvINH49o1bZLjGBsKUsaqsNvIp_RgK-sm6l9v0tYNwYunhBBeyOf7yPu9h9CNdlKEh2DioYw5JowYLE1s8DCMBKNORI-srZNN0DRliwWfddDtFgtjjKmdz8ydr9Z_-bpUG28qGzAv_BK-h_YoTRqsVuuyFQZ88DDOJmP31oYeXxW6I9D0_pU2peYa08P_jXeE-jv4Hcy2jOUYdUzRQx-j3VczlBbmotDlK_jUmtUa8gJaqBJk-ZeB2jYPj2XxAmlZYFV6SjBPRxU00Fz7Q8obYiHzarEWq0_I6niym5UBLxy6suqj5-nkaXyP27QJOHfawhoroRwTlyQSibZWJNIGUipJqQ50HFuiaaDdRefExkpJHYR6aBIaGCmEu6yMRCeoW5SFOUVABHPqomLcUSFWhNKHD1QRD5gWnBlxhnp-1ZZvTWSMZbtg5383X6ADvzGNK9wl6roJmSu0r97XebW6rrfzG1gDpRk |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NS8MwFA9zCnpS2cRvc_BoXNqma3IcY8PhLGOdsNvIp_RgK-sm6l9v0tYNwYunhBBeyOf7yPu9B8CtslKEg2AiX4QMEUo0EjrUyPcCTiMrogfGlMkmojim8zmbNMDdBgujtS6dz_S9q5Z_-SqXa2cq61An_BK2A3ZDQnxcobVqpy0Ps86onwz69rX1HMLKs4eg6v8rcUrJN4aH_xvxCLS3ADw42bCWY9DQWQt89LafzTA3cMozlb9Cl1yzWME0gzVYCSbpl4aldR6O8-wFxnmGZO4owWncK2AFzjU_pJwpFiZOMVZ8-QmTMqLseqmhEw9tWbTB83Aw6z-gOnECSq2-sEKSS8vGBQl4VxnDu8JgIaSIIoVVGBqiIqzsVWfEhFIKhT3l626EteDcXldKghPQzPJMnwJIOLUKo6TMUiGGe8IFEJQBw1RxRjU_Ay23aou3KjbGol6w87-bb8D-w-xpvBiP4scLcOA2qXKMuwRNOzl9Bfbk-yotltfl1n4DFyWoYA |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2019+23rd+International+Computer+Science+and+Engineering+Conference+%28ICSEC%29&rft.atitle=Application+of+Random+Forest+in+Limited+Size+Human+Long+Non-coding+RNAs+Identification+with+Secondary+Structure+Features&rft.au=Anuntakarun%2C+Songtham&rft.au=Wattanapornprom%2C+Warin&rft.au=Lertampaiporn%2C+Supatcha&rft.date=2019-10-01&rft.pub=IEEE&rft.spage=65&rft.epage=69&rft_id=info:doi/10.1109%2FICSEC47112.2019.8974749&rft.externalDocID=8974749 |