Feature Selection for Classification using Principal Component Analysis and Information Gain

•Feature selection improves performance of machine learning algorithms.•Feature selection with more n-tier techniques is simpler and more stable.•A feature selection model that is not specific to any data set is widely applied. Feature Selection and classification have previously been widely applied...

Full description

Saved in:
Bibliographic Details
Published in:Expert systems with applications Vol. 174; p. 114765
Main Authors: Odhiambo Omuya, Erick, Onyango Okeyo, George, Waema Kimwele, Michael
Format: Journal Article
Language:English
Published: New York Elsevier Ltd 15.07.2021
Elsevier BV
Subjects:
ISSN:0957-4174, 1873-6793
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract •Feature selection improves performance of machine learning algorithms.•Feature selection with more n-tier techniques is simpler and more stable.•A feature selection model that is not specific to any data set is widely applied. Feature Selection and classification have previously been widely applied in various areas like business, medical and media fields. High dimensionality in datasets is one of the main challenges that has been experienced in classifying data, data mining and sentiment analysis. Irrelevant and redundant attributes have also had a negative impact on the complexity and operation of algorithms for classifying data. Consequently, the algorithms record poor results or performance. Some existing work use all attributes for classification, some of which are insignificant for the task, thereby leading to poor performance. This paper therefore develops a hybrid filter model for feature selection based on principal component analysis and information gain. The hybrid model is then applied to support classification using machine learning techniques e.g. the Naïve Bayes technique. Experimental results demonstrate that the hybrid filter model reduces data dimensions, selects appropriate feature sets, and reduces training time, hence providing better classification performance as measured by accuracy, precision and recall..
AbstractList Feature Selection and classification have previously been widely applied in various areas like business, medical and media fields. High dimensionality in datasets is one of the main challenges that has been experienced in classifying data, data mining and sentiment analysis. Irrelevant and redundant attributes have also had a negative impact on the complexity and operation of algorithms for classifying data. Consequently, the algorithms record poor results or performance. Some existing work use all attributes for classification, some of which are insignificant for the task, thereby leading to poor performance. This paper therefore develops a hybrid filter model for feature selection based on principal component analysis and information gain. The hybrid model is then applied to support classification using machine learning techniques e.g. the Naïve Bayes technique. Experimental results demonstrate that the hybrid filter model reduces data dimensions, selects appropriate feature sets, and reduces training time, hence providing better classification performance as measured by accuracy, precision and recall..
•Feature selection improves performance of machine learning algorithms.•Feature selection with more n-tier techniques is simpler and more stable.•A feature selection model that is not specific to any data set is widely applied. Feature Selection and classification have previously been widely applied in various areas like business, medical and media fields. High dimensionality in datasets is one of the main challenges that has been experienced in classifying data, data mining and sentiment analysis. Irrelevant and redundant attributes have also had a negative impact on the complexity and operation of algorithms for classifying data. Consequently, the algorithms record poor results or performance. Some existing work use all attributes for classification, some of which are insignificant for the task, thereby leading to poor performance. This paper therefore develops a hybrid filter model for feature selection based on principal component analysis and information gain. The hybrid model is then applied to support classification using machine learning techniques e.g. the Naïve Bayes technique. Experimental results demonstrate that the hybrid filter model reduces data dimensions, selects appropriate feature sets, and reduces training time, hence providing better classification performance as measured by accuracy, precision and recall..
ArticleNumber 114765
Author Onyango Okeyo, George
Odhiambo Omuya, Erick
Waema Kimwele, Michael
Author_xml – sequence: 1
  givenname: Erick
  surname: Odhiambo Omuya
  fullname: Odhiambo Omuya, Erick
  email: omuya.erick@mksu.ac.ke
  organization: School of Engineering and Technology, Machakos University, Kenya
– sequence: 2
  givenname: George
  surname: Onyango Okeyo
  fullname: Onyango Okeyo, George
  email: gokeyo@andrew.cmu.edu
  organization: Carnegie Mellon University Africa
– sequence: 3
  givenname: Michael
  surname: Waema Kimwele
  fullname: Waema Kimwele, Michael
  email: mkimwele@jkuat.ac.ke
  organization: School of Computing and Information Technology, Jomo Kenyatta University of Agriculture and Technology, Kenya
BookMark eNp9kEFLwzAYhoNMcJv-AU8Fz61JmjQJeBnDzcFAQb0JIUtTSemSmrTK_r3d6snDTh98fM_H-z4zMHHeGQBuEcwQRMV9nZn4ozIMMcoQIqygF2CKOMvTgol8AqZQUJYSxMgVmMVYQ4gYhGwKPlZGdX0wyatpjO6sd0nlQ7JsVIy2slqdVn207jN5CdZp26omWfp9OwRwXbJwqjlEGxPlymTjBnY_Imtl3TW4rFQTzc3fnIP31ePb8indPq83y8U21bkgXVpWSBWcU6EMN5opQowuhRhK7DgsScUwzHmJMKQc72gllCYQE6oEFxViTOdzcDf-bYP_6k3sZO37MCSLEtNc0AIjhocrPF7p4GMMppJtsHsVDhJBebQoa3m0KI8W5WhxgPg_SNvu1LALyjbn0YcRNUP1b2uCjNoap01pw6Balt6ew38BsiWP8Q
CitedBy_id crossref_primary_10_1007_s10586_022_03657_5
crossref_primary_10_3390_app14062614
crossref_primary_10_1109_ACCESS_2022_3169765
crossref_primary_10_3390_make5030061
crossref_primary_10_3390_info15070420
crossref_primary_10_3390_technologies13020042
crossref_primary_10_1007_s41870_023_01245_3
crossref_primary_10_1038_s41598_025_07628_9
crossref_primary_10_1111_exsy_13254
crossref_primary_10_1016_j_eswa_2023_122649
crossref_primary_10_1038_s41598_025_16415_5
crossref_primary_10_1016_j_psep_2021_05_026
crossref_primary_10_1007_s13198_025_02842_0
crossref_primary_10_1186_s40537_025_01205_7
crossref_primary_10_3390_a16010012
crossref_primary_10_1002_cpe_7903
crossref_primary_10_1016_j_procs_2023_10_366
crossref_primary_10_7717_peerj_cs_2880
crossref_primary_10_3390_fi16120458
crossref_primary_10_3390_e26090796
crossref_primary_10_1002_cpe_7299
crossref_primary_10_3390_electronics14152959
crossref_primary_10_3390_diagnostics13081440
crossref_primary_10_1080_15376494_2025_2480689
crossref_primary_10_3390_app13052950
crossref_primary_10_3390_math10244772
crossref_primary_10_1109_ACCESS_2022_3198597
crossref_primary_10_1016_j_asoc_2024_111477
crossref_primary_10_1051_bioconf_20248205020
crossref_primary_10_1007_s11042_024_19769_6
crossref_primary_10_3390_app14083337
crossref_primary_10_1016_j_solener_2023_111918
crossref_primary_10_1109_ACCESS_2023_3247866
crossref_primary_10_1038_s41598_025_16311_y
crossref_primary_10_1016_j_cose_2021_102352
crossref_primary_10_3390_s24165223
crossref_primary_10_1186_s40537_025_01116_7
crossref_primary_10_32604_jrm_2022_022300
crossref_primary_10_3390_diagnostics12123000
crossref_primary_10_1016_j_comnet_2024_110939
crossref_primary_10_1007_s10115_024_02114_6
crossref_primary_10_1155_2022_5593147
crossref_primary_10_1002_cem_3602
crossref_primary_10_1155_2022_5339926
crossref_primary_10_1016_j_procs_2024_01_172
crossref_primary_10_1109_TITS_2025_3556444
crossref_primary_10_3390_diagnostics15030248
crossref_primary_10_3389_fenrg_2024_1479478
crossref_primary_10_1016_j_bspc_2024_106949
crossref_primary_10_1080_00207543_2024_2423802
crossref_primary_10_3390_rs16173190
crossref_primary_10_3390_s21165302
crossref_primary_10_1007_s13198_024_02294_y
crossref_primary_10_1080_19393555_2025_2496327
crossref_primary_10_1007_s42044_024_00174_z
crossref_primary_10_3390_s22051836
crossref_primary_10_1007_s00170_024_14868_y
crossref_primary_10_20965_jaciii_2022_p0671
crossref_primary_10_1371_journal_pone_0290332
crossref_primary_10_7759_cureus_51036
crossref_primary_10_3390_data9020020
crossref_primary_10_1080_00051144_2023_2218164
crossref_primary_10_1186_s13638_023_02292_x
crossref_primary_10_1007_s11356_022_18819_6
crossref_primary_10_1038_s41598_023_34951_w
crossref_primary_10_3390_jpm13071071
crossref_primary_10_1007_s00354_023_00222_5
crossref_primary_10_3233_THC_219008
crossref_primary_10_3390_electronics13010205
crossref_primary_10_1080_15389588_2025_2530074
crossref_primary_10_3390_app15094823
crossref_primary_10_3390_math9202622
crossref_primary_10_1109_ACCESS_2025_3566430
crossref_primary_10_1016_j_eswa_2023_121024
crossref_primary_10_1016_j_teler_2025_100199
crossref_primary_10_1109_ACCESS_2025_3600570
crossref_primary_10_1002_qub2_46
crossref_primary_10_1109_ACCESS_2025_3538278
crossref_primary_10_3390_rs15041111
crossref_primary_10_1007_s10115_023_02010_5
crossref_primary_10_1007_s10614_024_10577_6
crossref_primary_10_1007_s41062_025_02203_7
crossref_primary_10_1007_s11042_024_18553_w
crossref_primary_10_1007_s41348_025_01100_6
crossref_primary_10_1016_j_jobe_2025_113722
crossref_primary_10_1007_s40747_025_01784_1
crossref_primary_10_3390_info13070314
crossref_primary_10_1038_s41598_025_15155_w
crossref_primary_10_1038_s41598_022_22814_9
crossref_primary_10_14500_aro_12034
crossref_primary_10_3390_e27080881
crossref_primary_10_1007_s42979_025_04035_9
crossref_primary_10_1016_j_pmcj_2025_102103
crossref_primary_10_32604_cmc_2022_028055
crossref_primary_10_3390_app14135711
crossref_primary_10_32628_IJSRST218535
crossref_primary_10_32604_cmc_2023_029163
crossref_primary_10_1016_j_eswa_2022_116794
crossref_primary_10_1002_aisy_202401150
crossref_primary_10_1007_s11518_022_5520_1
crossref_primary_10_1016_j_eswa_2023_119607
crossref_primary_10_1142_S0219649225500169
crossref_primary_10_1155_2022_9238968
crossref_primary_10_1016_j_procs_2022_09_384
crossref_primary_10_1038_s41598_023_48230_1
crossref_primary_10_1093_jcde_qwac028
crossref_primary_10_1515_geo_2022_0402
crossref_primary_10_3390_s23104792
crossref_primary_10_1007_s10661_024_13423_2
crossref_primary_10_3390_s24175712
crossref_primary_10_3233_JIFS_221720
crossref_primary_10_1016_j_engappai_2023_107114
crossref_primary_10_3103_S014641162306007X
crossref_primary_10_1145_3653025
crossref_primary_10_3390_biomimetics9110662
crossref_primary_10_3390_systems11090483
crossref_primary_10_1002_esp_5737
crossref_primary_10_3390_app15020778
crossref_primary_10_2478_ers_2024_0031
crossref_primary_10_3390_pr11010065
crossref_primary_10_1007_s11071_024_10073_4
crossref_primary_10_3390_w17182688
crossref_primary_10_1007_s42044_025_00229_9
crossref_primary_10_1002_cpe_6756
crossref_primary_10_23919_JSEE_2024_000055
crossref_primary_10_3390_electronics12112427
crossref_primary_10_3390_buildings12111907
crossref_primary_10_1108_IMDS_03_2024_0257
crossref_primary_10_1016_j_procs_2025_04_439
crossref_primary_10_1016_j_eswa_2021_114986
crossref_primary_10_1016_j_measen_2024_101037
crossref_primary_10_1177_14727978251321744
crossref_primary_10_1016_j_rser_2025_116299
crossref_primary_10_1007_s10489_022_04275_9
crossref_primary_10_1007_s00521_023_08941_y
crossref_primary_10_1016_j_ijhydene_2022_04_026
crossref_primary_10_1016_j_eswa_2022_117989
crossref_primary_10_1186_s40708_024_00225_y
crossref_primary_10_1007_s41060_025_00735_w
crossref_primary_10_3390_su15043043
crossref_primary_10_7717_peerj_cs_1041
crossref_primary_10_1088_2632_2153_ad861d
crossref_primary_10_3390_math13060996
Cites_doi 10.1016/j.compeleceng.2013.11.024
10.1186/s13634-016-0355-x
10.1016/j.eswa.2015.07.052
10.1016/j.patcog.2016.11.003
10.1016/j.jtbi.2018.10.047
10.1371/journal.pone.0166017
10.1142/S0219720016500293
10.1145/1835804.1835848
10.1016/j.eswa.2016.03.031
10.1155/2018/1407817
10.1007/s10462-019-09682-y
10.1145/1273496.1273641
10.1016/j.eswa.2019.01.083
10.1504/IJKESDP.2016.084603
10.1016/j.eswa.2016.03.020
ContentType Journal Article
Copyright 2021 Elsevier Ltd
Copyright Elsevier BV Jul 15, 2021
Copyright_xml – notice: 2021 Elsevier Ltd
– notice: Copyright Elsevier BV Jul 15, 2021
DBID AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1016/j.eswa.2021.114765
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts

DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1873-6793
ExternalDocumentID 10_1016_j_eswa_2021_114765
S0957417421002062
GroupedDBID --K
--M
.DC
.~1
0R~
13V
1B1
1RT
1~.
1~5
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
9JO
AAAKF
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AARIN
AAXUO
AAYFN
ABBOA
ABFNM
ABMAC
ABMVD
ABUCO
ABYKQ
ACDAQ
ACGFS
ACHRH
ACNTT
ACRLP
ACZNC
ADBBV
ADEZE
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGJBL
AGUBO
AGUMN
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJOXV
ALEQD
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
APLSM
AXJTR
BJAXD
BKOJK
BLXMC
BNSAS
CS3
DU5
EBS
EFJIC
EFLBG
EO8
EO9
EP2
EP3
F5P
FDB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HAMUX
IHE
J1W
JJJVA
KOM
LG9
LY1
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
PQQKQ
Q38
ROL
RPZ
SDF
SDG
SDP
SDS
SES
SPC
SPCBC
SSB
SSD
SSL
SST
SSV
SSZ
T5K
TN5
~G-
29G
9DU
AAAKG
AAQXK
AATTM
AAXKI
AAYWO
AAYXX
ABJNI
ABKBG
ABUFD
ABWVN
ABXDB
ACLOT
ACNNM
ACRPL
ACVFH
ADCNI
ADJOM
ADMUD
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AGQPQ
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
ASPBG
AVWKF
AZFZN
CITATION
EFKBS
EJD
FEDTE
FGOYB
G-2
HLZ
HVGLF
HZ~
R2-
SBC
SET
SEW
WUQ
XPP
ZMT
~HD
7SC
8FD
AFXIZ
AGCQF
AGRNS
BNPGV
JQ2
L7M
L~C
L~D
SSH
ID FETCH-LOGICAL-c394t-df1a68859ae8ec7a44ecd99765b80d4f72038d120582b5f9ac40245a989f177c3
ISICitedReferencesCount 224
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000663146900011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0957-4174
IngestDate Fri Jul 25 03:43:57 EDT 2025
Tue Nov 18 22:36:20 EST 2025
Sat Nov 29 07:08:26 EST 2025
Fri Feb 23 02:46:14 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Dimensionality reduction
Feature selection
Filter model
Information gain
Classification
Principal component analysis
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c394t-df1a68859ae8ec7a44ecd99765b80d4f72038d120582b5f9ac40245a989f177c3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
PQID 2539562172
PQPubID 2045477
ParticipantIDs proquest_journals_2539562172
crossref_primary_10_1016_j_eswa_2021_114765
crossref_citationtrail_10_1016_j_eswa_2021_114765
elsevier_sciencedirect_doi_10_1016_j_eswa_2021_114765
PublicationCentury 2000
PublicationDate 2021-07-15
PublicationDateYYYYMMDD 2021-07-15
PublicationDate_xml – month: 07
  year: 2021
  text: 2021-07-15
  day: 15
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle Expert systems with applications
PublicationYear 2021
Publisher Elsevier Ltd
Elsevier BV
Publisher_xml – name: Elsevier Ltd
– name: Elsevier BV
References Xu, Gu, Wang, Wang, Qin (b0145) 2019; 10
Zhao, Z., & Liu, H. (2007). Spectral feature selection for supervised and unsupervised learning. In Proceedings of the 24th international conference on Machine learning, pages 1151– 1157. ACM.
Kashef, Nezamabadi-pour, Nikpour (b0060) 2018; 8
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3(11). 57–82. https://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf.
Zien, Kraemer, Sören, Gunnar (b0170) 2009
Syed, Hafiz, Saif (b0115) 2020; 11
Kamkar, Gupta, Phung, Venkatesh (b0065) 2015
Tang, Alelyani, Liu (b0125) 2014
Raghavendra, Indiramma (b0100) 2016; 5
461, 92-101. https://doi.org.10.1016/j.jtbi.2018.10.047.
Chen, G.
Cao, M.
Xin, Hu, Wang, Gao (b0140) 2015
Ahmed, Iman, Min (b0005) 2013; 4
W. Zheng T. Eilamstock T. Wu A. Spagna Multi-features based network revealing the structural abnormalities in autism spectrum disorder IEEE Transactions Affective Computing 1 1 2019 https://doi.org.10.1109/TAFFC.2890597.
Qiu, Wu, Ding, Xu, Feng (b0095) 2016
Chandrashekar, Sahin (b0020) 2014; 40
Sheikhpour, Sarram, Gharaghani, Chahooki (b0105) 2017; 64
Lavanya, Rani (b0070) 2011; 2
Solorio-Fernández, Carrasco-Ochoa, Martínez-Trinidad (b0110) 2020; 53
Powers (b0090) 2016; 2
Alhaj, T. Siraj, M., Zainal, A., & Elhaj, H. (2016). Feature Selection Using Information Gain for Improved Structural-Based Alert Correlation. PLoS ONE 11(11). https://doi.org/ 10.1371/ journal.pone. 0166017.
(2018). GuoPrediction and functional analysis of prokaryote lysine acetylation site by incorporating six types of features into Chou's general PseAAC, Journal of Theoretical Biology
Wen, Wong (b0135) 2016; 14
Heydari, Tavakoli, Salim (b0045) 2016; 58
Yu, J.
Tan, Steinbach, Kumar (b0120) 2006
Zhao, Wang, Liu (b0160) 2010
Liu (b0075) 2015
D. Cai C. Zhang X. He Unsupervised feature selection for multi-cluster data 2010 ACM 333 342.
Nguyen, Shirai, Velcin (b0085) 2015; 42
Indah, A., & Adiwijaya, A. (2018). Applied Computational Intelligence and Soft Computing. 8 (1407817), 5. Hindawi. https://doi.org/10.1155/2018/1407817.
Fernández-Gavilanes, Álvarez-López, Juncal-Martínez, Costa-Montenegro, González-Castaño (b0035) 2016; 58
Trstenjak, Donko (b0130) 2016; 10
Chin, Andri, Habibollah, Nuzly (b0030) 2016; 13
Nobre, Neves (b0080) 2019; 125
Zhang (b0150) 2004
Raghavendra (10.1016/j.eswa.2021.114765_b0100) 2016; 5
Ahmed (10.1016/j.eswa.2021.114765_b0005) 2013; 4
10.1016/j.eswa.2021.114765_b0015
Solorio-Fernández (10.1016/j.eswa.2021.114765_b0110) 2020; 53
Kashef (10.1016/j.eswa.2021.114765_b0060) 2018; 8
Kamkar (10.1016/j.eswa.2021.114765_b0065) 2015
10.1016/j.eswa.2021.114765_b0155
Zhang (10.1016/j.eswa.2021.114765_b0150) 2004
10.1016/j.eswa.2021.114765_b0040
Wen (10.1016/j.eswa.2021.114765_b0135) 2016; 14
Zhao (10.1016/j.eswa.2021.114765_b0160) 2010
Powers (10.1016/j.eswa.2021.114765_b0090) 2016; 2
Qiu (10.1016/j.eswa.2021.114765_b0095) 2016
Liu (10.1016/j.eswa.2021.114765_b0075) 2015
Heydari (10.1016/j.eswa.2021.114765_b0045) 2016; 58
Chin (10.1016/j.eswa.2021.114765_b0030) 2016; 13
Trstenjak (10.1016/j.eswa.2021.114765_b0130) 2016; 10
Fernández-Gavilanes (10.1016/j.eswa.2021.114765_b0035) 2016; 58
Sheikhpour (10.1016/j.eswa.2021.114765_b0105) 2017; 64
10.1016/j.eswa.2021.114765_b0025
10.1016/j.eswa.2021.114765_b0165
Nobre (10.1016/j.eswa.2021.114765_b0080) 2019; 125
10.1016/j.eswa.2021.114765_b0010
Tan (10.1016/j.eswa.2021.114765_b0120) 2006
Xu (10.1016/j.eswa.2021.114765_b0145) 2019; 10
10.1016/j.eswa.2021.114765_b0050
Xin (10.1016/j.eswa.2021.114765_b0140) 2015
Syed (10.1016/j.eswa.2021.114765_b0115) 2020; 11
Nguyen (10.1016/j.eswa.2021.114765_b0085) 2015; 42
Tang (10.1016/j.eswa.2021.114765_b0125) 2014
Zien (10.1016/j.eswa.2021.114765_b0170) 2009
Chandrashekar (10.1016/j.eswa.2021.114765_b0020) 2014; 40
Lavanya (10.1016/j.eswa.2021.114765_b0070) 2011; 2
References_xml – volume: 53
  start-page: 907
  year: 2020
  end-page: 948
  ident: b0110
  article-title: A review of unsupervised feature selection methods
  publication-title: Artificial Intelligence Review.
– volume: 13
  start-page: 971
  year: 2016
  end-page: 989
  ident: b0030
  article-title: Supervised, unsupervised, and semi supervised feature selection: a review on gene selection
  publication-title: IEEE/ACM TCBB.
– volume: 40
  start-page: 16
  year: 2014
  end-page: 28
  ident: b0020
  article-title: A survey on feature selection methods
  publication-title: Computers and Electrical Engineering.
– start-page: 1
  year: 2015
  end-page: 10
  ident: b0065
  article-title: Exploiting Feature Relationships Towards Stable Feature Selection
  publication-title: IEEE International Conference on Data Science and Advanced Analytics (DSAA)
– volume: 14
  start-page: 1650029
  year: 2016
  ident: b0135
  article-title: Evaluating feature-selection stability in next-generation proteomics
  publication-title: iology
– volume: 11
  start-page: 469
  year: 2020
  ident: b0115
  article-title: A Comparative Study of Feature Selection Approaches: 2016–2020”
  publication-title: Journal of Scientific & Engineering Research
– volume: 10
  year: 2019
  ident: b0145
  article-title: Autoencoder Based Feature Selection Method for Classification of Anticancer Drug Response.
  publication-title: Genomics.
– volume: 42
  start-page: 9603
  year: 2015
  end-page: 9611
  ident: b0085
  article-title: Sentiment analysis on social media for stock movement prediction
  publication-title: Expert Systems with Applications.
– reference: Cao, M.,
– reference: W. Zheng T. Eilamstock T. Wu A. Spagna Multi-features based network revealing the structural abnormalities in autism spectrum disorder IEEE Transactions Affective Computing 1 1 2019 https://doi.org.10.1109/TAFFC.2890597.
– reference: Zhao, Z., & Liu, H. (2007). Spectral feature selection for supervised and unsupervised learning. In Proceedings of the 24th international conference on Machine learning, pages 1151– 1157. ACM.
– volume: 4
  start-page: 33
  year: 2013
  end-page: 39
  ident: b0005
  article-title: Performance Comparison between Naïve Bayes, Decision Tree and K-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool
  publication-title: International Journal of Advanced Computer Science and Applications
– reference: & Yu, J.
– start-page: 1910
  year: 2015
  end-page: 1916
  ident: b0140
  article-title: Feature Selection from Brain sMRI Proc,
  publication-title: Artificial Intelligence.
– volume: 58
  start-page: 83
  year: 2016
  end-page: 92
  ident: b0045
  article-title: Detection of fake opinions using time series
  publication-title: Expert Systems with Applications.
– volume: 10
  start-page: 1184
  year: 2016
  end-page: 1190
  ident: b0130
  article-title: Case-Based Reasoning: A Hybrid Classification Model Improved with an Expert's Knowledge for High-Dimensional Problems
  publication-title: International Journal of Computer, Electrical, Automation, Control and Information Engineering
– volume: 125
  start-page: 181
  year: 2019
  end-page: 194
  ident: b0080
  article-title: Combining Principal Component Analysis, Discrete Wavelet Transform and XGBoost to trade in the financial markets
  publication-title: Expert Systems with Applications.
– year: 2004
  ident: b0150
  article-title: The Optimality of Naïve Bayes
  publication-title: Semantic Scholar.
– year: 2006
  ident: b0120
  article-title: Introduction to Data Mining
– year: 2010
  ident: b0160
  article-title: Efficient spectral feature selection with minimum redundancy
  publication-title: (AAAI)
– volume: 5
  start-page: 262
  year: 2016
  end-page: 284
  ident: b0100
  article-title: Hybrid data mining model for the classification and prediction of medical datasets
  publication-title: International Journal of Knowledge Engineering and Soft Data Paradigms.
– volume: 58
  start-page: 57
  year: 2016
  end-page: 75
  ident: b0035
  article-title: Unsupervised method for sentiment analysis in online texts
  publication-title: Expert Systems with Applications.
– reference: Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3(11). 57–82. https://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf.
– volume: 64
  start-page: 141
  year: 2017
  end-page: 158
  ident: b0105
  article-title: A Survey on semi-supervised feature selection methods
  publication-title: Pattern Recognitiossn.
– volume: 8
  year: 2018
  ident: b0060
  article-title: Multilevel Feature Selection: A comprehensive review and guiding experiments
  publication-title: Wiley Period.
– year: 2015
  ident: b0075
  article-title: Sentiment analysis: Mining opinions, sentiments and emotions
– year: 2009
  ident: b0170
  publication-title: The Feature Importance Ranking Measure.
– volume: 2
  start-page: 756
  year: 2011
  end-page: 763
  ident: b0070
  article-title: Analysis of feature selection with classification – Breast Cancer Data Sets
  publication-title: Research gate publication
– start-page: 67
  year: 2016
  ident: b0095
  article-title: A survey of machine learning for big data processing
  publication-title: EURASIP Journal on Advances in Signal Processing
– reference: 461, 92-101. https://doi.org.10.1016/j.jtbi.2018.10.047.
– reference: D. Cai C. Zhang X. He Unsupervised feature selection for multi-cluster data 2010 ACM 333 342.
– reference: Chen, G.,
– reference: Alhaj, T. Siraj, M., Zainal, A., & Elhaj, H. (2016). Feature Selection Using Information Gain for Improved Structural-Based Alert Correlation. PLoS ONE 11(11). https://doi.org/ 10.1371/ journal.pone. 0166017.
– volume: 2
  start-page: 37
  year: 2016
  end-page: 63
  ident: b0090
  article-title: Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation
  publication-title: Journal of Machine Learning Technologies
– reference: (2018). GuoPrediction and functional analysis of prokaryote lysine acetylation site by incorporating six types of features into Chou's general PseAAC, Journal of Theoretical Biology,
– start-page: 37
  year: 2014
  end-page: 64
  ident: b0125
  article-title: Feature Selection for Classification: A review
  publication-title: Data Classification: Algorithms and Applications
– reference: Indah, A., & Adiwijaya, A. (2018). Applied Computational Intelligence and Soft Computing. 8 (1407817), 5. Hindawi. https://doi.org/10.1155/2018/1407817.
– volume: 40
  start-page: 16
  issue: 1
  year: 2014
  ident: 10.1016/j.eswa.2021.114765_b0020
  article-title: A survey on feature selection methods
  publication-title: Computers and Electrical Engineering.
  doi: 10.1016/j.compeleceng.2013.11.024
– start-page: 67
  year: 2016
  ident: 10.1016/j.eswa.2021.114765_b0095
  article-title: A survey of machine learning for big data processing
  publication-title: EURASIP Journal on Advances in Signal Processing
  doi: 10.1186/s13634-016-0355-x
– volume: 2
  start-page: 37
  issue: 1
  year: 2016
  ident: 10.1016/j.eswa.2021.114765_b0090
  article-title: Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation
  publication-title: Journal of Machine Learning Technologies
– volume: 11
  start-page: 469
  issue: 2
  year: 2020
  ident: 10.1016/j.eswa.2021.114765_b0115
  article-title: A Comparative Study of Feature Selection Approaches: 2016–2020”
  publication-title: Journal of Scientific & Engineering Research
– volume: 8
  issue: 2
  year: 2018
  ident: 10.1016/j.eswa.2021.114765_b0060
  article-title: Multilevel Feature Selection: A comprehensive review and guiding experiments
  publication-title: Wiley Period.
– volume: 42
  start-page: 9603
  issue: 24
  year: 2015
  ident: 10.1016/j.eswa.2021.114765_b0085
  article-title: Sentiment analysis on social media for stock movement prediction
  publication-title: Expert Systems with Applications.
  doi: 10.1016/j.eswa.2015.07.052
– start-page: 1910
  year: 2015
  ident: 10.1016/j.eswa.2021.114765_b0140
  article-title: Feature Selection from Brain sMRI Proc, Twenty-Ninth AAAI Conference on
  publication-title: Artificial Intelligence.
– volume: 64
  start-page: 141
  year: 2017
  ident: 10.1016/j.eswa.2021.114765_b0105
  article-title: A Survey on semi-supervised feature selection methods
  publication-title: Pattern Recognitiossn.
  doi: 10.1016/j.patcog.2016.11.003
– ident: 10.1016/j.eswa.2021.114765_b0025
  doi: 10.1016/j.jtbi.2018.10.047
– ident: 10.1016/j.eswa.2021.114765_b0010
  doi: 10.1371/journal.pone.0166017
– volume: 14
  start-page: 1650029
  issue: 5
  year: 2016
  ident: 10.1016/j.eswa.2021.114765_b0135
  article-title: Evaluating feature-selection stability in next-generation proteomics
  publication-title: Journal of Bioinformatics and Computational Biology
  doi: 10.1142/S0219720016500293
– ident: 10.1016/j.eswa.2021.114765_b0015
  doi: 10.1145/1835804.1835848
– volume: 10
  issue: 233
  year: 2019
  ident: 10.1016/j.eswa.2021.114765_b0145
  article-title: Autoencoder Based Feature Selection Method for Classification of Anticancer Drug Response. Frontiers in Genetics: Computational
  publication-title: Genomics.
– volume: 13
  start-page: 971
  issue: 5
  year: 2016
  ident: 10.1016/j.eswa.2021.114765_b0030
  article-title: Supervised, unsupervised, and semi supervised feature selection: a review on gene selection
  publication-title: IEEE/ACM TCBB.
– start-page: 1
  year: 2015
  ident: 10.1016/j.eswa.2021.114765_b0065
  article-title: Exploiting Feature Relationships Towards Stable Feature Selection
– volume: 58
  start-page: 57
  year: 2016
  ident: 10.1016/j.eswa.2021.114765_b0035
  article-title: Unsupervised method for sentiment analysis in online texts
  publication-title: Expert Systems with Applications.
  doi: 10.1016/j.eswa.2016.03.031
– ident: 10.1016/j.eswa.2021.114765_b0050
  doi: 10.1155/2018/1407817
– year: 2009
  ident: 10.1016/j.eswa.2021.114765_b0170
  publication-title: The Feature Importance Ranking Measure.
– volume: 53
  start-page: 907
  issue: 2
  year: 2020
  ident: 10.1016/j.eswa.2021.114765_b0110
  article-title: A review of unsupervised feature selection methods
  publication-title: Artificial Intelligence Review.
  doi: 10.1007/s10462-019-09682-y
– year: 2006
  ident: 10.1016/j.eswa.2021.114765_b0120
– year: 2015
  ident: 10.1016/j.eswa.2021.114765_b0075
– year: 2010
  ident: 10.1016/j.eswa.2021.114765_b0160
  article-title: Efficient spectral feature selection with minimum redundancy
– ident: 10.1016/j.eswa.2021.114765_b0155
  doi: 10.1145/1273496.1273641
– volume: 4
  start-page: 33
  issue: 11
  year: 2013
  ident: 10.1016/j.eswa.2021.114765_b0005
  article-title: Performance Comparison between Naïve Bayes, Decision Tree and K-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool
  publication-title: International Journal of Advanced Computer Science and Applications
– volume: 10
  start-page: 1184
  issue: 6
  year: 2016
  ident: 10.1016/j.eswa.2021.114765_b0130
  article-title: Case-Based Reasoning: A Hybrid Classification Model Improved with an Expert's Knowledge for High-Dimensional Problems
  publication-title: International Journal of Computer, Electrical, Automation, Control and Information Engineering
– volume: 2
  start-page: 756
  issue: 5
  year: 2011
  ident: 10.1016/j.eswa.2021.114765_b0070
  article-title: Analysis of feature selection with classification – Breast Cancer Data Sets
  publication-title: Research gate publication
– volume: 125
  start-page: 181
  issue: 1
  year: 2019
  ident: 10.1016/j.eswa.2021.114765_b0080
  article-title: Combining Principal Component Analysis, Discrete Wavelet Transform and XGBoost to trade in the financial markets
  publication-title: Expert Systems with Applications.
  doi: 10.1016/j.eswa.2019.01.083
– start-page: 37
  year: 2014
  ident: 10.1016/j.eswa.2021.114765_b0125
  article-title: Feature Selection for Classification: A review
– ident: 10.1016/j.eswa.2021.114765_b0165
– volume: 5
  start-page: 262
  issue: 3/4
  year: 2016
  ident: 10.1016/j.eswa.2021.114765_b0100
  article-title: Hybrid data mining model for the classification and prediction of medical datasets
  publication-title: International Journal of Knowledge Engineering and Soft Data Paradigms.
  doi: 10.1504/IJKESDP.2016.084603
– ident: 10.1016/j.eswa.2021.114765_b0040
– year: 2004
  ident: 10.1016/j.eswa.2021.114765_b0150
  article-title: The Optimality of Naïve Bayes
  publication-title: Semantic Scholar.
– volume: 58
  start-page: 83
  year: 2016
  ident: 10.1016/j.eswa.2021.114765_b0045
  article-title: Detection of fake opinions using time series
  publication-title: Expert Systems with Applications.
  doi: 10.1016/j.eswa.2016.03.020
SSID ssj0017007
Score 2.7047215
Snippet •Feature selection improves performance of machine learning algorithms.•Feature selection with more n-tier techniques is simpler and more stable.•A feature...
Feature Selection and classification have previously been widely applied in various areas like business, medical and media fields. High dimensionality in...
SourceID proquest
crossref
elsevier
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 114765
SubjectTerms Algorithms
Classification
Data mining
Dimensionality reduction
Feature selection
Filter model
Information gain
Machine learning
Principal component analysis
Principal components analysis
Title Feature Selection for Classification using Principal Component Analysis and Information Gain
URI https://dx.doi.org/10.1016/j.eswa.2021.114765
https://www.proquest.com/docview/2539562172
Volume 174
WOSCitedRecordID wos000663146900011&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals 2021
  customDbUrl:
  eissn: 1873-6793
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0017007
  issn: 0957-4174
  databaseCode: AIEXJ
  dateStart: 19950101
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3Lb9MwGLdKx4ELb8RgIB-4VZnqxK7t44TGW9skBuoBKXJsZ7Ra06qPsV35y_kcP9oNbWIHLlGbh2Pl98uXz98ToTd1Xhha6CrjlRUZLYzOKlKpjBgDCgI3PG-7NXz_wg8OxHAojzqd3zEX5uyUN404P5ez_wo17AOwXersLeBOg8IO-A2gwxZgh-0_Ae-UOucV-Np2uImRhG3zSxcW5AFftSaCI29pb-0Hk9m0cXEBqUqJDxROyY299yoU6R6n6D07X4ZS0DFJbsMdnsy35udITapp73CyulBR9qYEocPmQjUncBSkyXRtpV8b-u1E9T6PQNyHwOeNOP9grciJM4P6fM1kduQZJb4zT5LA4a-XobBC475_xF_i3Vsaxrt28cvVjMrJ7vrky7W0r3zjUuRhDGobl26M0o1R-jHuoK2cMym6aGvv4_7wU_JF8b5Puo8zD6lXPkrw6kyuU2-ufOhb7eX4Iboflh14z9PlEerY5jF6EFt64CDhn6AfgT04sQcDA_Bl9uCWPTixByf24MgeDOzBG-zBjj1P0bd3-8dvP2ShAUemC0mXmamJGgjBpLLCaq4otdpIUGBZJfqG1s6FLwzJ-0zkFaul0tR58pUUsiac6-IZ6jZw--cI56TOmdVK9WtDmTWSSqspkwOm1UBTs41IfG6lDtXpXZOU0_J6xLZRL10z87VZbjybRTjKoF16rbEEdt143U7Ergyv-aLMWSFh5QDa_4tbTeIlurd-K3ZQdzlf2Vforj5bjhbz14F5fwBS96lW
linkProvider Elsevier
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Feature+Selection+for+Classification+using+Principal+Component+Analysis+and+Information+Gain&rft.jtitle=Expert+systems+with+applications&rft.au=Odhiambo+Omuya%2C+Erick&rft.au=Onyango+Okeyo%2C+George&rft.au=Waema+Kimwele%2C+Michael&rft.date=2021-07-15&rft.issn=0957-4174&rft.volume=174&rft.spage=114765&rft_id=info:doi/10.1016%2Fj.eswa.2021.114765&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_eswa_2021_114765
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0957-4174&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0957-4174&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0957-4174&client=summon