Research paper classification systems based on TF-IDF and LDA schemes

With the increasing advance of computer and information technologies, numerous research papers have been published online as well as offline, and as new research fields have been continuingly created, users have a lot of trouble in finding and categorizing their interesting research papers. In order...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Human-centric computing and information sciences Ročník 9; číslo 1; s. 1 - 21
Hlavní autoři: Kim, Sang-Woon, Gil, Joon-Min
Médium: Journal Article
Jazyk:angličtina
Vydáno: Berlin/Heidelberg Springer Berlin Heidelberg 26.08.2019
Korea Information Processing Society, Computer Software Research Group
Témata:
ISSN:2192-1962, 2192-1962
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract With the increasing advance of computer and information technologies, numerous research papers have been published online as well as offline, and as new research fields have been continuingly created, users have a lot of trouble in finding and categorizing their interesting research papers. In order to overcome the limitations, this paper proposes a research paper classification system that can cluster research papers into the meaningful class in which papers are very likely to have similar subjects. The proposed system extracts representative keywords from the abstracts of each paper and topics by Latent Dirichlet allocation (LDA) scheme. Then, the K-means clustering algorithm is applied to classify the whole papers into research papers with similar subjects, based on the Term frequency-inverse document frequency (TF-IDF) values of each paper.
AbstractList With the increasing advance of computer and information technologies, numerous research papers have been published online as well as offline, and as new research fields have been continuingly created, users have a lot of trouble in finding and categorizing their interesting research papers. In order to overcome the limitations, this paper proposes a research paper classification system that can cluster research papers into the meaningful class in which papers are very likely to have similar subjects. The proposed system extracts representative keywords from the abstracts of each paper and topics by Latent Dirichlet allocation (LDA) scheme. Then, the K-means clustering algorithm is applied to classify the whole papers into research papers with similar subjects, based on the Term frequency-inverse document frequency (TF-IDF) values of each paper.
ArticleNumber 30
Author Gil, Joon-Min
Kim, Sang-Woon
Author_xml – sequence: 1
  givenname: Sang-Woon
  surname: Kim
  fullname: Kim, Sang-Woon
  organization: Department of Police Administration, Daegu Catholic University
– sequence: 2
  givenname: Joon-Min
  orcidid: 0000-0001-6774-8476
  surname: Gil
  fullname: Gil, Joon-Min
  email: jmgil@cu.ac.kr
  organization: School of Information Technology Eng., Daegu Catholic University
BookMark eNp9kE1Lw0AQhhepYK39Ad4WPEf3K9nmWPqhhYAg9bxsNrM2pU3iTnrovzcxgiLo4WVehnlmhveajKq6AkJuObvnfJY8IJeJlhHjaS8R6QsyFr3haSJGP_wVmSLuGWOcaRFrOSarF0Cwwe1oYxsI1B0sYulLZ9uyriiesYUj0twiFLRrbNfRZrmmtipotpxTdDs4At6QS28PCNOvOiGv69V28RRlz4-bxTyLnJKijXQBqeZKKpEXXKd5nIJXkvnCz7zXlsdFmoACHQOA4i7Pfc4U0xwKYMrNEjkhd8PeJtTvJ8DW7OtTqLqTRgidJkqrRHZTfJhyoUYM4E0TyqMNZ8OZ6QMzQ2CmC6uXMLpj9C_Gle1nBm2w5eFfUgwkdleqNwjfP_0NfQBlIIAX
CitedBy_id crossref_primary_10_1007_s44248_025_00052_4
crossref_primary_10_1109_ACCESS_2022_3223094
crossref_primary_10_1038_s41598_024_77240_w
crossref_primary_10_1177_21582440221141867
crossref_primary_10_1016_j_cose_2025_104391
crossref_primary_10_1002_smr_70012
crossref_primary_10_1016_j_matcom_2020_12_009
crossref_primary_10_1016_j_engappai_2025_111039
crossref_primary_10_3390_molecules27186042
crossref_primary_10_3390_su16146121
crossref_primary_10_2196_47934
crossref_primary_10_1007_s00521_020_05662_4
crossref_primary_10_1080_1475939X_2023_2218390
crossref_primary_10_1080_23311916_2024_2359850
crossref_primary_10_1134_S1054661824700792
crossref_primary_10_1016_j_engappai_2024_107962
crossref_primary_10_1016_j_aei_2025_103468
crossref_primary_10_1016_j_procs_2022_11_308
crossref_primary_10_3390_buildings14041083
crossref_primary_10_3138_slte_2025_0002
crossref_primary_10_3390_app11199080
crossref_primary_10_7717_peerj_cs_1940
crossref_primary_10_1108_RIA_04_2023_0047
crossref_primary_10_3390_foods10112767
crossref_primary_10_1108_JRIT_01_2024_0016
crossref_primary_10_3390_biomimetics10050275
crossref_primary_10_1007_s12652_021_03401_8
crossref_primary_10_3233_IDA_240075
crossref_primary_10_14201_ADCAIJ2020924968
crossref_primary_10_1007_s12530_022_09450_4
crossref_primary_10_3390_electronics8111250
crossref_primary_10_1080_02664763_2023_2247617
crossref_primary_10_3390_info15060351
crossref_primary_10_53759_7669_jmc202505127
crossref_primary_10_1155_2021_5051667
crossref_primary_10_3390_buildings15132201
crossref_primary_10_1016_j_procs_2024_03_039
crossref_primary_10_1134_S1054661823030288
crossref_primary_10_1108_NEJE_10_2023_0088
crossref_primary_10_1109_ACCESS_2021_3069248
crossref_primary_10_1007_s40622_024_00378_z
crossref_primary_10_3390_ijgi13100352
crossref_primary_10_3390_mca29060106
crossref_primary_10_1016_j_measurement_2022_110957
crossref_primary_10_2196_47408
crossref_primary_10_3390_math10030449
crossref_primary_10_1186_s40854_023_00587_y
crossref_primary_10_3390_coatings14081027
crossref_primary_10_1016_j_artmed_2023_102716
crossref_primary_10_1007_s11042_023_16615_z
crossref_primary_10_1007_s11192_024_05086_0
crossref_primary_10_3389_fgene_2023_1166975
crossref_primary_10_3390_app12073656
crossref_primary_10_1007_s11135_022_01444_3
crossref_primary_10_1007_s40558_023_00278_5
crossref_primary_10_1007_s12065_023_00825_3
crossref_primary_10_1016_j_trf_2025_05_005
crossref_primary_10_3390_su131910856
crossref_primary_10_1080_08839514_2022_2145637
crossref_primary_10_1051_e3sconf_202344802048
crossref_primary_10_1057_s41599_024_03530_3
crossref_primary_10_1186_s13326_023_00298_4
crossref_primary_10_1016_j_rser_2024_115326
crossref_primary_10_1109_ACCESS_2024_3385860
crossref_primary_10_1016_j_eswa_2022_119028
crossref_primary_10_1080_09640568_2023_2240951
crossref_primary_10_3390_data8120180
crossref_primary_10_1016_j_cities_2025_106440
crossref_primary_10_1093_comjnl_bxae042
crossref_primary_10_1016_j_apacoust_2025_111084
crossref_primary_10_1007_s13278_022_00977_7
crossref_primary_10_1007_s11334_022_00516_9
crossref_primary_10_1080_13467581_2024_2399681
crossref_primary_10_1038_s41598_024_53345_0
crossref_primary_10_1038_s41598_025_05842_z
crossref_primary_10_1186_s13673_020_00229_7
crossref_primary_10_1016_j_renene_2025_123253
crossref_primary_10_3390_bdcc6040123
crossref_primary_10_5572_KOSAE_2025_41_4_667
crossref_primary_10_3390_app15031149
crossref_primary_10_1109_JTEHM_2023_3241635
crossref_primary_10_1109_ACCESS_2023_3237463
crossref_primary_10_1016_j_eswa_2024_123319
crossref_primary_10_32604_cmc_2022_020480
crossref_primary_10_1016_j_procs_2022_09_403
crossref_primary_10_3389_fpubh_2022_1023890
crossref_primary_10_3233_JIFS_237749
crossref_primary_10_3390_info12120508
crossref_primary_10_1051_itmconf_20224403011
crossref_primary_10_1134_S1995080222150239
crossref_primary_10_4018_IJSWIS_388181
crossref_primary_10_1016_j_ipm_2025_104168
crossref_primary_10_1016_j_jrtpm_2021_100265
crossref_primary_10_1371_journal_pone_0303996
crossref_primary_10_3390_app11125694
crossref_primary_10_3390_plants11223097
crossref_primary_10_1109_ACCESS_2024_3368003
crossref_primary_10_1051_e3sconf_202449901016
crossref_primary_10_1109_THMS_2023_3319290
crossref_primary_10_1108_K_11_2023_2268
crossref_primary_10_1016_j_joi_2022_101262
crossref_primary_10_3390_su142315681
crossref_primary_10_1371_journal_pone_0280221
crossref_primary_10_1142_S2282717X24300022
crossref_primary_10_1016_j_stae_2025_100096
Cites_doi 10.1080/03081079.2017.1291635
10.1016/j.future.2015.01.005
10.1002/9780470382776
10.1109/ICEEOT.2016.7754750
10.1137/1.9780898718348
10.1016/0377-0427(87)90125-7
10.7152/acro.v11i1.12774
10.1145/1327452.1327492
10.1109/WISP.2009.5286530
10.1186/s13673-017-0116-3
10.1007/978-3-642-35527-1_27
10.1016/j.inffus.2015.07.003
10.1145/2365952.2366004
10.1007/978-3-642-38824-8_25
10.1016/j.future.2016.06.006
10.1186/s40537-015-0020-5
10.1145/2023568.2023579
10.1016/j.proeng.2014.03.129
10.1016/j.ipm.2015.07.004
10.1007/978-1-4614-3223-4_6
10.1007/s11192-014-1321-8
10.1016/j.neucom.2016.07.074
ContentType Journal Article
Copyright The Author(s) 2019
Human-centric Computing and Information Sciences is a copyright of Springer, (2019). All Rights Reserved. © 2019. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: The Author(s) 2019
– notice: Human-centric Computing and Information Sciences is a copyright of Springer, (2019). All Rights Reserved. © 2019. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID C6C
AAYXX
CITATION
3V.
7XB
8AL
8FE
8FG
8FK
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
M0N
P5Z
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
Q9U
DOI 10.1186/s13673-019-0192-7
DatabaseName Springer Nature OA Free Journals
CrossRef
ProQuest Central (Corporate)
ProQuest Central (purchase pre-March 2016)
Computing Database (Alumni Edition)
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Computer Science Collection
ProQuest Central Essentials
ProQuest Central
Technology Collection
ProQuest One Community College
ProQuest Central
ProQuest Central Student
ProQuest SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database
Computing Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest - Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic (retired)
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest Central Basic
DatabaseTitle CrossRef
Publicly Available Content Database
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Central Korea
ProQuest Central (New)
Advanced Technologies & Aerospace Collection
ProQuest Computing
ProQuest Central Basic
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest One Academic (New)
ProQuest Central (Alumni)
DatabaseTitleList
Publicly Available Content Database
CrossRef
Database_xml – sequence: 1
  dbid: PIMPY
  name: Publicly Available Content Database
  url: http://search.proquest.com/publiccontent
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2192-1962
EndPage 21
ExternalDocumentID 10_1186_s13673_019_0192_7
GroupedDBID -A0
0R~
3V.
40G
5VS
8FE
8FG
AAFWJ
AAKKN
ABEEZ
ABFTD
ABUWG
ACACY
ACGFS
ACULB
ADBBV
ADINQ
AFGXO
AFKRA
AHBYD
AHYZX
ALMA_UNASSIGNED_HOLDINGS
AMKLP
ARAPS
ARCSS
AZQEC
BCNDV
BENPR
BGLVJ
BPHCQ
C24
C6C
CCPQU
DWQXO
EBS
EJD
GNUQQ
GROUPED_DOAJ
HCIFZ
IAO
ISR
ITC
K6V
K7-
KQ8
M0N
M~E
OK1
P62
PIMPY
PQQKQ
PROAC
RSV
SCO
SOJ
U2A
AAYXX
AFFHD
CITATION
PHGZM
PHGZT
PQGLB
7XB
8AL
8FK
JQ2
PKEHL
PQEST
PQUKI
PRINS
Q9U
ID FETCH-LOGICAL-c432t-7de9714342bd179b59ef430fdf8ff7a15d96e4e75eee41cbbfb04071ede04c863
IEDL.DBID BENPR
ISICitedReferencesCount 152
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000483511900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2192-1962
IngestDate Sun Nov 09 06:25:44 EST 2025
Sat Nov 29 02:28:25 EST 2025
Tue Nov 18 21:49:16 EST 2025
Fri Feb 21 02:34:23 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords LDA
K-means clustering
TF-IDF
Paper classification
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c432t-7de9714342bd179b59ef430fdf8ff7a15d96e4e75eee41cbbfb04071ede04c863
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-6774-8476
OpenAccessLink https://www.proquest.com/docview/2279647463?pq-origsite=%requestingapplication%
PQID 2279647463
PQPubID 2034751
PageCount 21
ParticipantIDs proquest_journals_2279647463
crossref_primary_10_1186_s13673_019_0192_7
crossref_citationtrail_10_1186_s13673_019_0192_7
springer_journals_10_1186_s13673_019_0192_7
PublicationCentury 2000
PublicationDate 2019-08-26
PublicationDateYYYYMMDD 2019-08-26
PublicationDate_xml – month: 08
  year: 2019
  text: 2019-08-26
  day: 26
PublicationDecade 2010
PublicationPlace Berlin/Heidelberg
PublicationPlace_xml – name: Berlin/Heidelberg
– name: Heidelberg
PublicationTitle Human-centric computing and information sciences
PublicationTitleAbbrev Hum. Cent. Comput. Inf. Sci
PublicationYear 2019
Publisher Springer Berlin Heidelberg
Korea Information Processing Society, Computer Software Research Group
Publisher_xml – name: Springer Berlin Heidelberg
– name: Korea Information Processing Society, Computer Software Research Group
References HavrlantLKreinovichVA simple probabilistic explanation of term frequency-inverse document frequency (TF-IDF) heuristic (and variations motivated by this explanation)Int J Gen Syst20174612736362332710.1080/03081079.2017.1291635
IbrahimSPhanT-DCarpen-AmarieAChihoubH-EMoiseDAntoniuGGoverning energy consumption in Hadoop through CPU frequency scaling: an analysisFuture Gener Comput Syst20165421923210.1016/j.future.2015.01.005
VisentiniISnidaroLForestiGLDiversity-aware classifier ensemble selection via F-scoreInf Fus201628244310.1016/j.inffus.2015.07.003
Scikit-Learn. http://scikit-learn.org/stable/modules/classes.html. Accessed 15 Aug 2018.
ChoWChoiEDTG big data analysis for fuel consumption estimationJ Inf Process Syst2017132285304
GurusamyRSubramaniamVA machine learning approach for MRI brain tumor classificationComput Mater Continua201753291108
SinghJSinghGSinghROptimization of sentiment analysis using machine learning classifiersHum-cent Comput Inf Sci201773210.1186/s13673-017-0116-3
MahendranA“Opinion Mining for text classification,” IntJ Sci Eng Technol201326589594
GanGMaCWuJData clustering: theory, algorithms, and applications2007AlexandriaSIAM1185.6827410.1137/1.9780898718348
GuptaHSrivastavaRK-means based document clustering with automatic “K” selection and cluster refinementInt J Comput Sci Mob Appl201425713
RossiRGLopesAARezendeSOOptimization and label propagation in bipartite heterogeneous networks to improve transductive classification of textsInf Process Manag201652221725710.1016/j.ipm.2015.07.004
GurungPWaghRA study on topic identification using K means clustering algorithm: big vs. small documentsAdv Comput Sci Technol2017102221233
DeanJGhemawatSMapReduce: simplified data processing on large clustersCommun ACM200851110711310.1145/1327452.1327492
NanbaHKandoNOkumuraMClassification of research papers using citation links and citation types: towards automatic review article generationAdv Classif Res Online201111111713410.7152/acro.v11i1.12774
BleiDMNgAYJordanMILatent Dirichlet allocationJ Mach Learn Res2003399310221112.68379
BalabantarayRCSarmaCJhaMDocument clustering using K-means and K-medoidsInt J Knowl Based Comput Syst201311713
Xuan J et al. (2017) Automatic bug triage using semi-supervised text classification. arXiv preprint arXiv:1704.04769
GuiYaochengGaoZhiqiangLiRenyongYangXinHierarchical Text Classification for News Articles Based-on Named EntitiesAdvanced Data Mining and Applications2012Berlin, HeidelbergSpringer Berlin Heidelberg31832910.1007/978-3-642-35527-1_27
FGCS Journal. https://www.journals.elsevier.com/future-generation-computer-systems. Accessed 15 Aug 2018.
XuRWunschDClustering2008HobokenWiley10.1002/9780470382776
DudaROHartPEStorkDGPattern classification2012HobokenWiley0968.68140
AggarwalCharu C.ZhaiChengXiangA Survey of Text Classification AlgorithmsMining Text Data2012Boston, MASpringer US16322210.1007/978-1-4614-3223-4_6
AlsmadiIAlhamiIClustering and classification of email contentsJ King Saud Univ Comput Inf Sci.20152714657
KimJ-JHadoop based wavelet histogram for big data in cloudJ Inf Process Syst2017134668676
YauC-KClustering scientific documents with topic modelingScientometrics2014100376778610.1007/s11192-014-1321-8
NguyenThien HaiShiraiKiyoakiText Classification of Technical Papers Based on Text SegmentationNatural Language Processing and Information Systems2013Berlin, HeidelbergSpringer Berlin Heidelberg27828410.1007/978-3-642-38824-8_25
OliveiraGVImproving K-means through distributed scalable metaheuristicsNeurocomputing2017246455710.1016/j.neucom.2016.07.074
VeigaJExpositoRRTaboadaGLTounnoJFlame-MR: an event-driven architecture for MapReduce applicationsFuture Gener Comput Syst201665465610.1016/j.future.2016.06.006
TrstenjakBMikacSDonkoDKNN with TF-IDF based framework for text categorizationProcedia Eng2014691356136410.1016/j.proeng.2014.03.129
BarigouFImpact of instance selection on kNN-based text categorizationJ Inf Process Syst2018142418434
Mohsen T (2011) Subject classification of research papers based on interrelationships analysis. In: Proceeding of the 2011 workshop on knowledge discovery, modeling and simulation. pp 39–44
RousseeuwPJSilhouettes: a graphical aid to the interpretation and validation of cluster analysisJ Comput Appl Math19872053650636.6205910.1016/0377-0427(87)90125-7
NagwaniNKSummarizing large text collection using topic modeling and clustering based on MapReduce frameworkJ Big Data20152111810.1186/s40537-015-0020-5
Baker K, Bhandari A, Thotakura R (2009) An interactive automatic document classification prototype. In: Proc. of the third workshop on human-computer interaction and information retrieval. pp 30–33
Ramos J (2003) Using TF-IDF to determine word relevance in document queries. In: Proc. of the first int. conf. on machine learning
Jiang Y, Jia A, Feng Y, Zhao D (2012) Recommending academic papers via users’ reading purposes. In: Proc. of the sixth ACM conf. on recommender systems. pp 241–244
Bravo-Alcobendas D, Sorzano COS (2009) Clustering of biomedical scientific papers. In: 2009 IEEE Int. symp. on intelligent signal processing. pp 205–209
KodinariyaTMMakwanaPRReview on determining number of cluster in K-means clusteringInt J Adv Res Comput Sci Manag Stud2013169095
Bafna P, Pramod D, Vaidya A (2016) Document clustering: TF-IDF approach. In: IEEE int. conf. on electrical, electronics, and optimization techniques (ICEEOT). pp 61–66
Taheriyan M (2011) Subject classification of research papers based on interrelationships analysis. In: ACM proc. of the 2011 workshop on knowledge discovery, modeling and simulation. pp 39–44
HanyurwimfuraDBoLNjagiDDukuzumuremyiJPA centroid and Relationship based clustering for organizing research papersInt J Multimed Ubiquitous Eng201493219234
DM Blei (192_CR31) 2003; 3
H Gupta (192_CR7) 2014; 2
J Singh (192_CR15) 2017; 7
D Hanyurwimfura (192_CR26) 2014; 9
192_CR28
J-J Kim (192_CR10) 2017; 13
192_CR21
RO Duda (192_CR23) 2012
S Ibrahim (192_CR40) 2016; 54
192_CR20
GV Oliveira (192_CR36) 2017; 246
Yaocheng Gui (192_CR14) 2012
192_CR25
192_CR24
P Gurung (192_CR30) 2017; 10
B Trstenjak (192_CR4) 2014; 69
RC Balabantaray (192_CR6) 2013; 1
G Gan (192_CR34) 2007
L Havrlant (192_CR3) 2017; 46
J Dean (192_CR11) 2008; 51
A Mahendran (192_CR16) 2013; 2
F Barigou (192_CR19) 2018; 14
C-K Yau (192_CR5) 2014; 100
192_CR38
TM Kodinariya (192_CR35) 2013; 1
J Veiga (192_CR39) 2016; 65
192_CR32
192_CR2
I Alsmadi (192_CR17) 2015; 27
I Visentini (192_CR41) 2016; 28
R Gurusamy (192_CR8) 2017; 53
192_CR1
192_CR13
Charu C. Aggarwal (192_CR22) 2012
Thien Hai Nguyen (192_CR29) 2013
H Nanba (192_CR27) 2011; 11
W Cho (192_CR12) 2017; 13
NK Nagwani (192_CR9) 2015; 2
PJ Rousseeuw (192_CR37) 1987; 20
R Xu (192_CR33) 2008
RG Rossi (192_CR18) 2016; 52
References_xml – reference: YauC-KClustering scientific documents with topic modelingScientometrics2014100376778610.1007/s11192-014-1321-8
– reference: KimJ-JHadoop based wavelet histogram for big data in cloudJ Inf Process Syst2017134668676
– reference: NanbaHKandoNOkumuraMClassification of research papers using citation links and citation types: towards automatic review article generationAdv Classif Res Online201111111713410.7152/acro.v11i1.12774
– reference: RousseeuwPJSilhouettes: a graphical aid to the interpretation and validation of cluster analysisJ Comput Appl Math19872053650636.6205910.1016/0377-0427(87)90125-7
– reference: Xuan J et al. (2017) Automatic bug triage using semi-supervised text classification. arXiv preprint arXiv:1704.04769
– reference: NguyenThien HaiShiraiKiyoakiText Classification of Technical Papers Based on Text SegmentationNatural Language Processing and Information Systems2013Berlin, HeidelbergSpringer Berlin Heidelberg27828410.1007/978-3-642-38824-8_25
– reference: VeigaJExpositoRRTaboadaGLTounnoJFlame-MR: an event-driven architecture for MapReduce applicationsFuture Gener Comput Syst201665465610.1016/j.future.2016.06.006
– reference: AlsmadiIAlhamiIClustering and classification of email contentsJ King Saud Univ Comput Inf Sci.20152714657
– reference: DudaROHartPEStorkDGPattern classification2012HobokenWiley0968.68140
– reference: AggarwalCharu C.ZhaiChengXiangA Survey of Text Classification AlgorithmsMining Text Data2012Boston, MASpringer US16322210.1007/978-1-4614-3223-4_6
– reference: RossiRGLopesAARezendeSOOptimization and label propagation in bipartite heterogeneous networks to improve transductive classification of textsInf Process Manag201652221725710.1016/j.ipm.2015.07.004
– reference: HavrlantLKreinovichVA simple probabilistic explanation of term frequency-inverse document frequency (TF-IDF) heuristic (and variations motivated by this explanation)Int J Gen Syst20174612736362332710.1080/03081079.2017.1291635
– reference: GurungPWaghRA study on topic identification using K means clustering algorithm: big vs. small documentsAdv Comput Sci Technol2017102221233
– reference: VisentiniISnidaroLForestiGLDiversity-aware classifier ensemble selection via F-scoreInf Fus201628244310.1016/j.inffus.2015.07.003
– reference: GurusamyRSubramaniamVA machine learning approach for MRI brain tumor classificationComput Mater Continua201753291108
– reference: Mohsen T (2011) Subject classification of research papers based on interrelationships analysis. In: Proceeding of the 2011 workshop on knowledge discovery, modeling and simulation. pp 39–44
– reference: GuiYaochengGaoZhiqiangLiRenyongYangXinHierarchical Text Classification for News Articles Based-on Named EntitiesAdvanced Data Mining and Applications2012Berlin, HeidelbergSpringer Berlin Heidelberg31832910.1007/978-3-642-35527-1_27
– reference: BarigouFImpact of instance selection on kNN-based text categorizationJ Inf Process Syst2018142418434
– reference: NagwaniNKSummarizing large text collection using topic modeling and clustering based on MapReduce frameworkJ Big Data20152111810.1186/s40537-015-0020-5
– reference: Baker K, Bhandari A, Thotakura R (2009) An interactive automatic document classification prototype. In: Proc. of the third workshop on human-computer interaction and information retrieval. pp 30–33
– reference: Ramos J (2003) Using TF-IDF to determine word relevance in document queries. In: Proc. of the first int. conf. on machine learning
– reference: GuptaHSrivastavaRK-means based document clustering with automatic “K” selection and cluster refinementInt J Comput Sci Mob Appl201425713
– reference: BalabantarayRCSarmaCJhaMDocument clustering using K-means and K-medoidsInt J Knowl Based Comput Syst201311713
– reference: KodinariyaTMMakwanaPRReview on determining number of cluster in K-means clusteringInt J Adv Res Comput Sci Manag Stud2013169095
– reference: Taheriyan M (2011) Subject classification of research papers based on interrelationships analysis. In: ACM proc. of the 2011 workshop on knowledge discovery, modeling and simulation. pp 39–44
– reference: ChoWChoiEDTG big data analysis for fuel consumption estimationJ Inf Process Syst2017132285304
– reference: IbrahimSPhanT-DCarpen-AmarieAChihoubH-EMoiseDAntoniuGGoverning energy consumption in Hadoop through CPU frequency scaling: an analysisFuture Gener Comput Syst20165421923210.1016/j.future.2015.01.005
– reference: TrstenjakBMikacSDonkoDKNN with TF-IDF based framework for text categorizationProcedia Eng2014691356136410.1016/j.proeng.2014.03.129
– reference: MahendranA“Opinion Mining for text classification,” IntJ Sci Eng Technol201326589594
– reference: FGCS Journal. https://www.journals.elsevier.com/future-generation-computer-systems. Accessed 15 Aug 2018.
– reference: XuRWunschDClustering2008HobokenWiley10.1002/9780470382776
– reference: OliveiraGVImproving K-means through distributed scalable metaheuristicsNeurocomputing2017246455710.1016/j.neucom.2016.07.074
– reference: GanGMaCWuJData clustering: theory, algorithms, and applications2007AlexandriaSIAM1185.6827410.1137/1.9780898718348
– reference: Bafna P, Pramod D, Vaidya A (2016) Document clustering: TF-IDF approach. In: IEEE int. conf. on electrical, electronics, and optimization techniques (ICEEOT). pp 61–66
– reference: HanyurwimfuraDBoLNjagiDDukuzumuremyiJPA centroid and Relationship based clustering for organizing research papersInt J Multimed Ubiquitous Eng201493219234
– reference: BleiDMNgAYJordanMILatent Dirichlet allocationJ Mach Learn Res2003399310221112.68379
– reference: Scikit-Learn. http://scikit-learn.org/stable/modules/classes.html. Accessed 15 Aug 2018.
– reference: DeanJGhemawatSMapReduce: simplified data processing on large clustersCommun ACM200851110711310.1145/1327452.1327492
– reference: SinghJSinghGSinghROptimization of sentiment analysis using machine learning classifiersHum-cent Comput Inf Sci201773210.1186/s13673-017-0116-3
– reference: Bravo-Alcobendas D, Sorzano COS (2009) Clustering of biomedical scientific papers. In: 2009 IEEE Int. symp. on intelligent signal processing. pp 205–209
– reference: Jiang Y, Jia A, Feng Y, Zhao D (2012) Recommending academic papers via users’ reading purposes. In: Proc. of the sixth ACM conf. on recommender systems. pp 241–244
– ident: 192_CR28
– volume: 46
  start-page: 27
  issue: 1
  year: 2017
  ident: 192_CR3
  publication-title: Int J Gen Syst
  doi: 10.1080/03081079.2017.1291635
– volume: 54
  start-page: 219
  year: 2016
  ident: 192_CR40
  publication-title: Future Gener Comput Syst
  doi: 10.1016/j.future.2015.01.005
– volume: 2
  start-page: 7
  issue: 5
  year: 2014
  ident: 192_CR7
  publication-title: Int J Comput Sci Mob Appl
– volume-title: Clustering
  year: 2008
  ident: 192_CR33
  doi: 10.1002/9780470382776
– volume: 13
  start-page: 285
  issue: 2
  year: 2017
  ident: 192_CR12
  publication-title: J Inf Process Syst
– volume: 9
  start-page: 219
  issue: 3
  year: 2014
  ident: 192_CR26
  publication-title: Int J Multimed Ubiquitous Eng
– ident: 192_CR1
  doi: 10.1109/ICEEOT.2016.7754750
– ident: 192_CR38
– ident: 192_CR2
– volume-title: Data clustering: theory, algorithms, and applications
  year: 2007
  ident: 192_CR34
  doi: 10.1137/1.9780898718348
– volume: 20
  start-page: 53
  year: 1987
  ident: 192_CR37
  publication-title: J Comput Appl Math
  doi: 10.1016/0377-0427(87)90125-7
– volume: 1
  start-page: 7
  issue: 1
  year: 2013
  ident: 192_CR6
  publication-title: Int J Knowl Based Comput Syst
– volume: 11
  start-page: 117
  issue: 1
  year: 2011
  ident: 192_CR27
  publication-title: Adv Classif Res Online
  doi: 10.7152/acro.v11i1.12774
– ident: 192_CR21
– volume: 51
  start-page: 107
  issue: 1
  year: 2008
  ident: 192_CR11
  publication-title: Commun ACM
  doi: 10.1145/1327452.1327492
– ident: 192_CR13
– volume: 27
  start-page: 46
  issue: 1
  year: 2015
  ident: 192_CR17
  publication-title: J King Saud Univ Comput Inf Sci.
– ident: 192_CR24
  doi: 10.1109/WISP.2009.5286530
– volume: 1
  start-page: 90
  issue: 6
  year: 2013
  ident: 192_CR35
  publication-title: Int J Adv Res Comput Sci Manag Stud
– volume: 7
  start-page: 32
  year: 2017
  ident: 192_CR15
  publication-title: Hum-cent Comput Inf Sci
  doi: 10.1186/s13673-017-0116-3
– start-page: 318
  volume-title: Advanced Data Mining and Applications
  year: 2012
  ident: 192_CR14
  doi: 10.1007/978-3-642-35527-1_27
– volume: 2
  start-page: 589
  issue: 6
  year: 2013
  ident: 192_CR16
  publication-title: J Sci Eng Technol
– volume: 28
  start-page: 24
  year: 2016
  ident: 192_CR41
  publication-title: Inf Fus
  doi: 10.1016/j.inffus.2015.07.003
– ident: 192_CR32
  doi: 10.1145/2365952.2366004
– volume: 14
  start-page: 418
  issue: 2
  year: 2018
  ident: 192_CR19
  publication-title: J Inf Process Syst
– volume-title: Pattern classification
  year: 2012
  ident: 192_CR23
– start-page: 278
  volume-title: Natural Language Processing and Information Systems
  year: 2013
  ident: 192_CR29
  doi: 10.1007/978-3-642-38824-8_25
– volume: 10
  start-page: 221
  issue: 2
  year: 2017
  ident: 192_CR30
  publication-title: Adv Comput Sci Technol
– volume: 13
  start-page: 668
  issue: 4
  year: 2017
  ident: 192_CR10
  publication-title: J Inf Process Syst
– volume: 65
  start-page: 46
  year: 2016
  ident: 192_CR39
  publication-title: Future Gener Comput Syst
  doi: 10.1016/j.future.2016.06.006
– volume: 53
  start-page: 91
  issue: 2
  year: 2017
  ident: 192_CR8
  publication-title: Comput Mater Continua
– volume: 2
  start-page: 1
  issue: 1
  year: 2015
  ident: 192_CR9
  publication-title: J Big Data
  doi: 10.1186/s40537-015-0020-5
– volume: 3
  start-page: 993
  year: 2003
  ident: 192_CR31
  publication-title: J Mach Learn Res
– ident: 192_CR20
– ident: 192_CR25
  doi: 10.1145/2023568.2023579
– volume: 69
  start-page: 1356
  year: 2014
  ident: 192_CR4
  publication-title: Procedia Eng
  doi: 10.1016/j.proeng.2014.03.129
– volume: 52
  start-page: 217
  issue: 2
  year: 2016
  ident: 192_CR18
  publication-title: Inf Process Manag
  doi: 10.1016/j.ipm.2015.07.004
– start-page: 163
  volume-title: Mining Text Data
  year: 2012
  ident: 192_CR22
  doi: 10.1007/978-1-4614-3223-4_6
– volume: 100
  start-page: 767
  issue: 3
  year: 2014
  ident: 192_CR5
  publication-title: Scientometrics
  doi: 10.1007/s11192-014-1321-8
– volume: 246
  start-page: 45
  year: 2017
  ident: 192_CR36
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2016.07.074
SSID ssj0001072573
Score 2.5948162
Snippet With the increasing advance of computer and information technologies, numerous research papers have been published online as well as offline, and as new...
SourceID proquest
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1
SubjectTerms Algorithms
Artificial Intelligence
Big data
Classification
Cloud Computing for Human-centric Computing
Cluster analysis
Clustering
Communications Engineering
Computer Science
Computer Systems Organization and Communication Networks
Dirichlet problem
Information Systems and Communication Service
Information Systems Applications (incl.Internet)
IoT
Networks
Scientific papers
User Interfaces and Human Computer Interaction
Vector quantization
SummonAdditionalLinks – databaseName: SpringerOpen
  dbid: C24
  link: http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LS8QwEA66evDi-sTVVXLwpBS7TZo0x2UfKCyLB5W9lTymIGhdtqu_3yRNXRQV9NBLk5Qyj8xM5ssMQudacKIZIRGLjQ1QQJkok0xH1rpwBYk0qb-V9jDh02k2m4nbcI-7atDuTUrS79RerTN2VbniYg774_A9wrqF62jDVRNzOK5BuOLgD1ZibsWQhAzmtys_26CVY_klF-pNzLj9r5_bQdvBo8T9WgR20RqUe6jddGvAQXn30agB2eG5nNsB7fxmBxTyvMF1SecKO7NmsH1h5fZmOMayNHgy7GMbBcMzVAfofjy6G1xHoYlCpClJlhE3IFyPc5ooY5VPpQIKSuLCFFlRcNlLjWBAgacAQHtaqULFLsgDAzHVGSOHqFW-lHCEsI1mFRFSUUkZhZ6WYIxIpWJCZkIT0UFxQ9ZchwrjrtHFU-4jjYzlNZlySyL3JDnvoIuPJfO6vMZvk7sNr_KgaVXuKiAyyikjHXTZ8GY1_OPHjv80-wRtJZ65dlthXdRaLl7hFG3qt-VjtTjzAvgOcaXVDQ
  priority: 102
  providerName: Springer Nature
Title Research paper classification systems based on TF-IDF and LDA schemes
URI https://link.springer.com/article/10.1186/s13673-019-0192-7
https://www.proquest.com/docview/2279647463
Volume 9
WOSCitedRecordID wos000483511900001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2192-1962
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0001072573
  issn: 2192-1962
  databaseCode: M~E
  dateStart: 20110101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVPQU
  databaseName: Computer Science Database
  customDbUrl:
  eissn: 2192-1962
  dateEnd: 20201231
  omitProxy: false
  ssIdentifier: ssj0001072573
  issn: 2192-1962
  databaseCode: K7-
  dateStart: 20111101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/compscijour
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest advanced technologies & aerospace journals
  customDbUrl:
  eissn: 2192-1962
  dateEnd: 20201231
  omitProxy: false
  ssIdentifier: ssj0001072573
  issn: 2192-1962
  databaseCode: P5Z
  dateStart: 20111101
  isFulltext: true
  titleUrlDefault: https://search.proquest.com/hightechjournals
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: ProQuest Central
  customDbUrl:
  eissn: 2192-1962
  dateEnd: 20201231
  omitProxy: false
  ssIdentifier: ssj0001072573
  issn: 2192-1962
  databaseCode: BENPR
  dateStart: 20111101
  isFulltext: true
  titleUrlDefault: https://www.proquest.com/central
  providerName: ProQuest
– providerCode: PRVPQU
  databaseName: Publicly Available Content Database
  customDbUrl:
  eissn: 2192-1962
  dateEnd: 20201231
  omitProxy: false
  ssIdentifier: ssj0001072573
  issn: 2192-1962
  databaseCode: PIMPY
  dateStart: 20111101
  isFulltext: true
  titleUrlDefault: http://search.proquest.com/publiccontent
  providerName: ProQuest
– providerCode: PRVAVX
  databaseName: SpringerOpen
  customDbUrl:
  eissn: 2192-1962
  dateEnd: 20211231
  omitProxy: false
  ssIdentifier: ssj0001072573
  issn: 2192-1962
  databaseCode: C24
  dateStart: 20111201
  isFulltext: true
  titleUrlDefault: https://link.springer.com/search?facet-content-type=%22Journal%22
  providerName: Springer Nature
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpR3LTttAcFSgBy6kpSACFO2hJyoLx7ve9Z5QComKCpFVtQi4WPsYS0htEnDosd_eWWfdqJXKpQfPwWOvVp6ZnadnAN45rbiTnCcy9eSgoPVJYaRLSLsoi5nxeftX2vWlmkyKmxtdxoBbE8squzOxPaj9zIUY-UnodCeFEpKfzh-SMDUqZFfjCI012AidyojPNz6MJuXnVZQlVcSTPKYzB4U8aUKPslBCFMqENFmXfyqklZX5V2K01Tfj3v_u9BVsRUuTDZes8Rpe4HQbet0UBxaF-g2MuuI7NjdzQrhgT4cCopZmbNnquWFB3XlGN4ifL87HzEw9uzwfMvKO8Ts2O_B1PPpy9jGJwxUSJ3i2SJRHHWafi8x6Ekqba6wFT2tfF3WtzCD3WqJAlSOiGDhra5sG5w89psIVku_C-nQ2xT1g5OVaro0VRkiBA2fQe50bK7UptOO6D2n3hSsXO4-HARjfqtYDKWS1JEpFBAlXVqk-HP9-Zb5su_Hcw4cdIaoogU21okIf3nekXKH_udj-84sdwGbW8g6dL_IQ1hePT_gWXrofi_vm8Siy3xGsnWWC4CeVELz6OSJY5neELy-uyttfI9_m_g
linkProvider ProQuest
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9NAEB6FpBJcSClUhL72ABeQVce73vUeUFU1jRo1jXIIKJzMPsZSpZKGOID4U_2N7PpB1ErNLYcefPHaI9nz7cx-O7MzAO-NFNRwSgMeWkdQUNsgUdwEzrsIjZGycXEq7etQjEbJdCrHDbirz8L4tMraJhaG2t4av0d-7CvdcSYYpyfzn4HvGuWjq3ULjRIWl_j3j6Ns-edBz-n3QxT1zydnF0HVVSAwjEbLQFiUvuk3i7R1aNSxxIzRMLNZkmVCdWMrOTIUMSKyrtE606FnPWgxZCbh1Ml9Bi3mwB42oTUeXI2_rXZ1QuHmAK3Cp92EH-e-JppPWfJpSdKtZu87wNWq9kEgtvBv_fZT-zPb8LJaSZPTEvqvoIGzHWjXXSpIZbRew3mdXEjmau4GjOcLPkGqwCQpS1nnxLtzS9wNN18HvT5RM0uGvVPi2D_-wPwNfNnIx-xCc3Y7w7dAHIvXVCrNFOMMu0ahtTJWmkuVSENlB8Jao6mpKqv7Bh83acGwEp6WIEgdAPwVpaIDH_-_Mi_Liqx7eL9WfFpZmDxdab0Dn2rorIYfFfZuvbAjeH4xuRqmw8Hocg9eRAVunS3l-9BcLn7hAWyZ38vrfHFYQZ_A901j6h9kLEDI
linkToPdf http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMw1V1Lb9NAEB6VFKFeaHmpgT72ABeQFce73vUeEGpJokaNoggB6s3sYywhQZLWKah_rb-OWT-IQGpvPfTgi9ceyd5vZvbbmZ0BeO204k5yHsnYE0FB66PMSBeRd1EWE-PT6lTa14maTrOzMz3bgOv2LExIq2xtYmWo_cKFPfJeqHQnhRKS94omLWI2GH1Ynkehg1SItLbtNGqInOLVb6Jv5fvxgOb6TZKMhp8_nkRNh4HICZ6sIuVRhwbgIrGekGlTjYXgceGLrCiU6adeSxSoUkQUfWdtYePAgNBjLFwmOcl9AJuKE-npwObxcDr7tN7hiRXpA29Cqf1M9spQHy2kL4UUJU0r23-d4XqF-19QtvJ1o-37_Jd24HGzwmZHtUo8gQ2cP4XttnsFa4zZMxi2SYdsaZY04AKPCIlTFVZZXeK6ZMHNe0Y3SI_HgxEzc88mgyNWEtp_YvkcvtzJx7yAznwxx11gxO4t18YKI6TAvjPovU6Nldpk2nHdhbid3dw1FddD448fecW8MpnXgMgJDOFKctWFt39fWdblRm57eK8FQd5YnjJfI6AL71oYrYdvFPbydmGH8IiAlE_G09NXsJVUECYTK_egs7q4xH146H6tvpcXB40WMPh215D6A3S4SWI
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Research+paper+classification+systems+based+on+TF-IDF+and+LDA+schemes&rft.jtitle=Human-centric+computing+and+information+sciences&rft.au=Sang-Woon%2C+Kim&rft.au=Joon-Min%2C+Gil&rft.date=2019-08-26&rft.pub=Korea+Information+Processing+Society%2C+Computer+Software+Research+Group&rft.eissn=2192-1962&rft.volume=9&rft.issue=1&rft.spage=1&rft.epage=21&rft_id=info:doi/10.1186%2Fs13673-019-0192-7&rft.externalDBID=HAS_PDF_LINK
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2192-1962&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2192-1962&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2192-1962&client=summon