Support Vector Machine – Recursive Feature Elimination for Feature Selection on Multi-omics Lung Cancer Data
Biological data obtained from sequencing technologies is growing exponentially. Multi-omics data is one of the biological data that exhibits high dimensionality, or more commonly known as the curse of dimensionality. The curse of dimensionality occurs when the dataset contains many features or attri...
Saved in:
| Published in: | Progress in Microbes and Molecular Biology Vol. 6; no. 1 |
|---|---|
| Main Authors: | , , , , , , |
| Format: | Journal Article |
| Language: | English |
| Published: |
HH Publisher
04.04.2023
|
| ISSN: | 2637-1049, 2637-1049 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Biological data obtained from sequencing technologies is growing exponentially. Multi-omics data is one of the biological data that exhibits high dimensionality, or more commonly known as the curse of dimensionality. The curse of dimensionality occurs when the dataset contains many features or attributes but with significantly fewer samples or observations. The study focuses on mitigating the curse of dimensionality by implementing Support Vector Machine – Recursive Feature Elimination (SVM-RFE) as the selected feature selection method in the lung cancer (LUSC) multi-omics dataset integrated from three single omics dataset comprising genomics, transcriptomics and epigenomics, and assess the quality of the selected feature subsets using SDAE and VAE deep learning classifiers. In this study, the LUSC datasets first undergo data pre-processing, including checking for missing values, normalization, and removing zero variance features. The cleaned LUSC datasets are then integrated to form a multi-omics dataset. Feature selection was performed on the LUSC multi-omics data using SVM-RFE to select several optimal feature subsets. The five smallest feature subsets (FS) are used in classification using SDAE and VAE neural networks to assess the quality of the feature subsets. The results show that all 5 VAE models can obtain an accuracy and AUC score of 1.000, while only 2 out of 5 SDAE models (FS 1000 & 4000) can do so. 3 out of 5 SDAE models have an AUC score of 0.500, indicating zero capability in separating the binary class labels. The study concludes that a fine-tuned supervised learning VAE model has better capability in classification tasks compared to SDAE models for this specific study. Additionally, 1000 and 4000 are the two most optimal feature subsets selected by the SVM-RFE algorithm. The SDAE and VAE models built with these feature subsets achieve the best classification results. |
|---|---|
| AbstractList | Biological data obtained from sequencing technologies is growing exponentially. Multi-omics data is one of the biological data that exhibits high dimensionality, or more commonly known as the curse of dimensionality. The curse of dimensionality occurs when the dataset contains many features or attributes but with significantly fewer samples or observations. The study focuses on mitigating the curse of dimensionality by implementing Support Vector Machine – Recursive Feature Elimination (SVM-RFE) as the selected feature selection method in the lung cancer (LUSC) multi-omics dataset integrated from three single omics dataset comprising genomics, transcriptomics and epigenomics, and assess the quality of the selected feature subsets using SDAE and VAE deep learning classifiers. In this study, the LUSC datasets first undergo data pre-processing, including checking for missing values, normalization, and removing zero variance features. The cleaned LUSC datasets are then integrated to form a multi-omics dataset. Feature selection was performed on the LUSC multi-omics data using SVM-RFE to select several optimal feature subsets. The five smallest feature subsets (FS) are used in classification using SDAE and VAE neural networks to assess the quality of the feature subsets. The results show that all 5 VAE models can obtain an accuracy and AUC score of 1.000, while only 2 out of 5 SDAE models (FS 1000 & 4000) can do so. 3 out of 5 SDAE models have an AUC score of 0.500, indicating zero capability in separating the binary class labels. The study concludes that a fine-tuned supervised learning VAE model has better capability in classification tasks compared to SDAE models for this specific study. Additionally, 1000 and 4000 are the two most optimal feature subsets selected by the SVM-RFE algorithm. The SDAE and VAE models built with these feature subsets achieve the best classification results. |
| Author | Howe, Chan Weng Ali Shah, Zuraini Lin, Ji Tong A Samah, Azurah Abdul Majid, Hairudin Azman, Nuraina Syaza Wen, Nies Hui |
| Author_xml | – sequence: 1 givenname: Nuraina Syaza surname: Azman fullname: Azman, Nuraina Syaza – sequence: 2 givenname: Azurah surname: A Samah fullname: A Samah, Azurah – sequence: 3 givenname: Ji Tong surname: Lin fullname: Lin, Ji Tong – sequence: 4 givenname: Hairudin surname: Abdul Majid fullname: Abdul Majid, Hairudin – sequence: 5 givenname: Zuraini surname: Ali Shah fullname: Ali Shah, Zuraini – sequence: 6 givenname: Nies Hui surname: Wen fullname: Wen, Nies Hui – sequence: 7 givenname: Chan Weng surname: Howe fullname: Howe, Chan Weng |
| BookMark | eNp1UdtqGzEQFcWFOq5f-6wfWEfSaqXdx-DYqcEhkLR9FaNbqrC7MlptIW_5h_5hv6Rrp4YSyDAww-GcwzDnAs362DuEvlCyKkUt5eWh6_QKyFQlkx_QnIlSFpTwZvbf_gkth-Fp4rCGlTXlc9Q_jIdDTBn_cCbHhG_B_Ay9w39efuN7Z8Y0hF8Obx3kMTm8aUMXesgh9thP7DP-4NpJfkSnvh3bHIrYBTPg_dg_4jX0xiV8DRk-o48e2sEt_80F-r7dfFt_LfZ3N7v11b4wTBBZyEbQxnrrqXMaGl9VTGpbEwJUVLy23NQOWG2JsKyiFS-1MMSCAWkYgVqWC7R79bURntQhhQ7Ss4oQ1AmI6VFBysG0TgHoRje8dAIs52A0qWqvmfCSEqv90Yu_epkUhyE5r0zIpx_kBKFVlKhTBOoYgTpHMMlWb2TnM94R_AXvnY3f |
| CitedBy_id | crossref_primary_10_3390_ai6080165 crossref_primary_10_1016_j_mex_2025_103210 crossref_primary_10_48084_etasr_11388 crossref_primary_10_1016_j_canlet_2025_217825 crossref_primary_10_1016_j_mex_2025_103219 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION DOA |
| DOI | 10.36877/pmmb.a0000327 |
| DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | CrossRef |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| EISSN | 2637-1049 |
| ExternalDocumentID | oai_doaj_org_article_aab9b943e6ad44acb058fb26f710dbf7 10_36877_pmmb_a0000327 |
| GroupedDBID | AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ |
| ID | FETCH-LOGICAL-c2607-79619dfdf1eeba9f5527bd800a16548d4c8ea28d06d251543b6c0daca7c20a873 |
| IEDL.DBID | DOA |
| ISSN | 2637-1049 |
| IngestDate | Fri Oct 03 12:52:01 EDT 2025 Tue Nov 18 22:30:19 EST 2025 Sat Nov 29 05:20:40 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 1 |
| Language | English |
| License | https://creativecommons.org/licenses/by-nc/4.0 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c2607-79619dfdf1eeba9f5527bd800a16548d4c8ea28d06d251543b6c0daca7c20a873 |
| OpenAccessLink | https://doaj.org/article/aab9b943e6ad44acb058fb26f710dbf7 |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_aab9b943e6ad44acb058fb26f710dbf7 crossref_citationtrail_10_36877_pmmb_a0000327 crossref_primary_10_36877_pmmb_a0000327 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-04-04 |
| PublicationDateYYYYMMDD | 2023-04-04 |
| PublicationDate_xml | – month: 04 year: 2023 text: 2023-04-04 day: 04 |
| PublicationDecade | 2020 |
| PublicationTitle | Progress in Microbes and Molecular Biology |
| PublicationYear | 2023 |
| Publisher | HH Publisher |
| Publisher_xml | – name: HH Publisher |
| SSID | ssj0002923814 |
| Score | 2.250253 |
| Snippet | Biological data obtained from sequencing technologies is growing exponentially. Multi-omics data is one of the biological data that exhibits high... |
| SourceID | doaj crossref |
| SourceType | Open Website Enrichment Source Index Database |
| Title | Support Vector Machine – Recursive Feature Elimination for Feature Selection on Multi-omics Lung Cancer Data |
| URI | https://doaj.org/article/aab9b943e6ad44acb058fb26f710dbf7 |
| Volume | 6 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2637-1049 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002923814 issn: 2637-1049 databaseCode: DOA dateStart: 20180101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8NAEF6kePAiior1xR4ET7FpssnuHrW2eKhFfJTewuwLhLaWPjz7H_yH_hJ3NmnJRbwIOS3LEr5JZr5ZZuYj5FJmmscucVGquI1YxuNIpG0kco7JFNomCdWEwz4fDMRoJB9rUl9YE1aOBy6BawEoqSRLbQ6GMdAqzoRTSe58aDTKhT5yz3pqyRT64ERiKGLllMY0F5y3ZpOJugb0zylKyNSiUG1Yf4gqvT2yW9FBelO-xj7ZstMDMkWpTU-L6TBcqdOHUPBo6ffnF33C-3EsOadI3lZzS7vjIMyFAFPPQDfrz0HhBlf9E_psI-xAXtC-_79pB609p3ewhEPy2uu-dO6jShch0h45HnHpsx7jjGtbq0A6HKKmjGd-gK1JwjAtLCTCxLnx7CVjqcp1bEAD10kMgqdHpDF9n9pjQr2b9EclYNs-M-GJBtBCSKFs5omIyuImidY4FboaGo7aFePCJw8B1wJxLda4NsnVZv-sHJfx685bhH2zC8dchwVv_KIyfvGX8U_-45BTsoMa8qEch52RxnK-sudkW38s3xbzi_Bd_QBcBdXa |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Support+Vector+Machine+%E2%80%93+Recursive+Feature+Elimination+for+Feature+Selection+on+Multi-omics+Lung+Cancer+Data&rft.jtitle=Progress+in+Microbes+and+Molecular+Biology&rft.au=Azman%2C+Nuraina+Syaza&rft.au=A+Samah%2C+Azurah&rft.au=Lin%2C+Ji+Tong&rft.au=Abdul+Majid%2C+Hairudin&rft.date=2023-04-04&rft.issn=2637-1049&rft.eissn=2637-1049&rft.volume=6&rft.issue=1&rft_id=info:doi/10.36877%2Fpmmb.a0000327&rft.externalDBID=n%2Fa&rft.externalDocID=10_36877_pmmb_a0000327 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2637-1049&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2637-1049&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2637-1049&client=summon |