Lung Cancer Classification Using the Extreme Gradient Boosting (XGBoost) Algorithm and Mutual Information for Feature Selection

Lung cancer is one of the deadliest types of cancer worldwide and is often detected too late due to the absence of early symptoms. This study aims to evaluate the impact of feature selection using Mutual Information on the performance of lung cancer classification with the XGBoost algorithm. Mutual...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Sistemasi : jurnal sistem informasi (Online) Ročník 14; číslo 5; s. 2198 - 2214
Hlavní autoři: Zizilia, Regitha, Chrisnanto, Yulison Herry, Abdillah, Gunawan
Médium: Journal Article
Jazyk:angličtina
indonéština
Vydáno: Islamic University of Indragiri 01.09.2025
Témata:
ISSN:2302-8149, 2540-9719
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Lung cancer is one of the deadliest types of cancer worldwide and is often detected too late due to the absence of early symptoms. This study aims to evaluate the impact of feature selection using Mutual Information on the performance of lung cancer classification with the XGBoost algorithm. Mutual Information is employed to select relevant features, including those with linear and non-linear relationships with the target variable, while XGBoost is chosen for its ability to handle large datasets and reduce overfitting. The study was conducted on a dataset containing 30,000 data entries, with data split scenarios of 90:10, 80:20, 70:30, and 60:40. The results show that the testing accuracy before applying Mutual Information ranged from 93.42% to 93.83%, while K-Fold Cross-Validation accuracy ranged from 94.59% to 94.76%. After feature selection, testing accuracy remained stable for the 70:30 and 60:40 split scenarios, at 93.60% and 93.42% respectively. However, K-Fold Cross-Validation accuracy decreased to 89.26% and 90.88%. In contrast, for the 90:10 and 80:20 split scenarios, a decline in accuracy was observed — testing accuracy dropped to 88.63% and 88.85%, and K-Fold Cross-Validation accuracy fell to 88.87% and 90.24%. Feature selection using Mutual Information improves computational efficiency by reducing the number of features, and it can be effectively applied to simplify feature sets without significantly compromising model performance in certain data scenarios, depending on the characteristics of the dataset.
AbstractList Lung cancer is one of the deadliest types of cancer worldwide and is often detected too late due to the absence of early symptoms. This study aims to evaluate the impact of feature selection using Mutual Information on the performance of lung cancer classification with the XGBoost algorithm. Mutual Information is employed to select relevant features, including those with linear and non-linear relationships with the target variable, while XGBoost is chosen for its ability to handle large datasets and reduce overfitting. The study was conducted on a dataset containing 30,000 data entries, with data split scenarios of 90:10, 80:20, 70:30, and 60:40. The results show that the testing accuracy before applying Mutual Information ranged from 93.42% to 93.83%, while K-Fold Cross-Validation accuracy ranged from 94.59% to 94.76%. After feature selection, testing accuracy remained stable for the 70:30 and 60:40 split scenarios, at 93.60% and 93.42% respectively. However, K-Fold Cross-Validation accuracy decreased to 89.26% and 90.88%. In contrast, for the 90:10 and 80:20 split scenarios, a decline in accuracy was observed — testing accuracy dropped to 88.63% and 88.85%, and K-Fold Cross-Validation accuracy fell to 88.87% and 90.24%. Feature selection using Mutual Information improves computational efficiency by reducing the number of features, and it can be effectively applied to simplify feature sets without significantly compromising model performance in certain data scenarios, depending on the characteristics of the dataset.
Author Chrisnanto, Yulison Herry
Zizilia, Regitha
Abdillah, Gunawan
Author_xml – sequence: 1
  givenname: Regitha
  surname: Zizilia
  fullname: Zizilia, Regitha
– sequence: 2
  givenname: Yulison Herry
  surname: Chrisnanto
  fullname: Chrisnanto, Yulison Herry
– sequence: 3
  givenname: Gunawan
  surname: Abdillah
  fullname: Abdillah, Gunawan
BookMark eNo9kUtLJDEURoM44GP8AbPLUhfVk2c9ltpoT0OLixlhdiGPmzZSVZEkLbryr1tdLa7ux_24By7nDB2PcQSEflGy4Ewy8juXIYfFKxVBLiQX8gidMilI1TW0O54yJ6xqqehO0EXOwRDJ65aJrjlFH5vduMVLPVpIeNnrqfbB6hLiiB9zmLryBPj2rSQYAK-SdgHGgm9izGXfXv5fzfkKX_fbmEJ5GrAeHb7flZ3u8Xr0MQ0H3JTwHeiyS4D_Qg92v_2JfnjdZ7j4mufo8e723_JPtXlYrZfXm8pSSWXlrGusFrVo6taxWtbOddIKsIJw4ac3QQP3lDNeW0G9IMbYtm49CG6EkZqfo_WB66J-Vi8pDDq9q6iDmhcxbZVOJdgeVNs4Zpk107kTlnvja2NMSzrrKBHOTyx6YNkUc07gv3mUqFmImoWoWYjaC-GfdWWE0Q
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.32520/stmsi.v14i5.5345
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: Directory of Open Access Journals (DOAJ)
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2540-9719
EndPage 2214
ExternalDocumentID oai_doaj_org_article_87d2c2cb68fd4c3fbf6bbb809cd104df
10_32520_stmsi_v14i5_5345
GroupedDBID AAYXX
ADBBV
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
GROUPED_DOAJ
ID FETCH-LOGICAL-c1515-dcd7ca464768d2656dd95c4ec4034f971eae3f13236c41f40bbc868fe43b4b5a3
IEDL.DBID DOA
ISSN 2302-8149
IngestDate Fri Oct 03 12:44:01 EDT 2025
Sat Nov 29 07:40:46 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
Indonesian
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c1515-dcd7ca464768d2656dd95c4ec4034f971eae3f13236c41f40bbc868fe43b4b5a3
OpenAccessLink https://doaj.org/article/87d2c2cb68fd4c3fbf6bbb809cd104df
PageCount 17
ParticipantIDs doaj_primary_oai_doaj_org_article_87d2c2cb68fd4c3fbf6bbb809cd104df
crossref_primary_10_32520_stmsi_v14i5_5345
PublicationCentury 2000
PublicationDate 2025-09-01
PublicationDateYYYYMMDD 2025-09-01
PublicationDate_xml – month: 09
  year: 2025
  text: 2025-09-01
  day: 01
PublicationDecade 2020
PublicationTitle Sistemasi : jurnal sistem informasi (Online)
PublicationYear 2025
Publisher Islamic University of Indragiri
Publisher_xml – name: Islamic University of Indragiri
SSID ssib053682497
ssj0002875155
Score 2.301983
Snippet Lung cancer is one of the deadliest types of cancer worldwide and is often detected too late due to the absence of early symptoms. This study aims to evaluate...
SourceID doaj
crossref
SourceType Open Website
Index Database
StartPage 2198
SubjectTerms classification
k-fold cross validation
lung cancer
mutual information
xgboost
Title Lung Cancer Classification Using the Extreme Gradient Boosting (XGBoost) Algorithm and Mutual Information for Feature Selection
URI https://doaj.org/article/87d2c2cb68fd4c3fbf6bbb809cd104df
Volume 14
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: Directory of Open Access Journals (DOAJ)
  customDbUrl:
  eissn: 2540-9719
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002875155
  issn: 2302-8149
  databaseCode: DOA
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2540-9719
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssib053682497
  issn: 2302-8149
  databaseCode: M~E
  dateStart: 20120101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8QwEA4iHrz4Ft_MwYMK1bZJm_a4K6seVAQV9laalxZ0V7pd8aR_3Zm0yt68eCkhlFC-mSTfl05mGDuUThrkyQZXv9wFQidlUArFg9Ciw6goNknpLwpfy9vbbDjM72ZKfVFMWJseuAXuLJMm1rFWaeaM0NwplyqlcGRtUEkYR6tvKPMZMYWelPA0Q10hf09bUBdQLRNfaY6WANQF7S9OHidxeDZpXifV6XskquQ04XS1aWaTmsnl7zedixW21LFF6LVfucrmKrPGln8qMUA3MdfZ1zXOWTgnC9bg61xSBJAHHXxQACDPg8FHQ6eBcFn7OK8G-uPxhMKe4Wh46dvH0Ht5GtdV8_wK5cjAzZSul0B3Z8kPhy0g3jitLdz7IjrYu8EeLwYP51dBV1oh0MRgAqON1KVIBaoNEyOnMyZPtLBahFy4XEa2tNyhUuWpFpEToVI6QxNYwZVQSck32fxoPLJbDEqkcDkRMelCNJBCg0kXKaNCZXOZm2128oNl8dZm0ChQeXjgCw984YEvCPht1ie0f1-k5Ne-A12i6Fyi-Msldv5jkF22GFOpXx9Otsfmm3pq99mCfm-qSX3gvQ2fN5-Db6BR3Po
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Lung+Cancer+Classification+Using+the+Extreme+Gradient+Boosting+%28XGBoost%29+Algorithm+and+Mutual+Information+for+Feature+Selection&rft.jtitle=Sistemasi+%3A+jurnal+sistem+informasi+%28Online%29&rft.au=Zizilia%2C+Regitha&rft.au=Chrisnanto%2C+Yulison+Herry&rft.au=Abdillah%2C+Gunawan&rft.date=2025-09-01&rft.issn=2302-8149&rft.eissn=2540-9719&rft.volume=14&rft.issue=5&rft.spage=2198&rft_id=info:doi/10.32520%2Fstmsi.v14i5.5345&rft.externalDBID=n%2Fa&rft.externalDocID=10_32520_stmsi_v14i5_5345
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2302-8149&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2302-8149&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2302-8149&client=summon