Spectral Clustering-Based Particle Swarm Optimization Algorithm for Document Clustering

The process of automatically grouping documents into clusters such that the documents in one cluster are very comparable to the documents in the remaining clusters have been known as document clustering. Due to its broad application in a number of fields, including search engines, web mining, and in...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Journal of information systems engineering & management Ročník 10; číslo 4s; s. 134 - 146
Hlavní autor: T. Elavarasi
Médium: Journal Article
Jazyk:angličtina
Vydáno: 17.01.2025
ISSN:2468-4376, 2468-4376
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract The process of automatically grouping documents into clusters such that the documents in one cluster are very comparable to the documents in the remaining clusters have been known as document clustering. Due to its broad application in a number of fields, including search engines, web mining, and information retrieval, it has been the subject of much research. It involves clustering documents that are identical to one another and calculating how identical they are. It facilitates simple navigation by offering effective document representation as well as visualization. Hence, this research paper plans to perform the document clustering using the nature inspired optimization technique. Initially, the dataset is manually gathered from different sources. Next, the data preparation has been done for extracting the text content from the published documents. These prepared data undergo pre-processing for removing the punctuations, stop words, and lowercase conversion. The features are extracted from these pre-processed data utilizing the Term Frequency-Inverse Document Frequency (TF-IDF) approach for extracting the keywords. The extracted features undergo the final clustering phase employing the spectral clustering algorithm, in which the parameter tuning has been done by the nature inspired optimization algorithm referred as Particle Swarm Optimization (PSO) with the consideration of silhouette score maximization as the objective function. This proposed spectral clustering-PSO clusters the final output into six classes such as data mining, deep learning, image, machine learning, network, and sports respectively. The proposed document clustering model describes its betterment over the remaining techniques with respect to distinct measures. The proposed spectral clustering-PSO in terms of silhouette score is 51.92%, 70.81%, 45.93%, and 20.89% better than JA-GWO, tpLDA, HDMA, and Net2Vec respectively. Similarly, the proposed spectral clustering-PSO in terms of davies bouldin score is 89.69%, 58.48%, 32.67%, and 13.99% advanced than JA-GWO, tpLDA, HDMA, and Net2Vec respectively. 
AbstractList The process of automatically grouping documents into clusters such that the documents in one cluster are very comparable to the documents in the remaining clusters have been known as document clustering. Due to its broad application in a number of fields, including search engines, web mining, and information retrieval, it has been the subject of much research. It involves clustering documents that are identical to one another and calculating how identical they are. It facilitates simple navigation by offering effective document representation as well as visualization. Hence, this research paper plans to perform the document clustering using the nature inspired optimization technique. Initially, the dataset is manually gathered from different sources. Next, the data preparation has been done for extracting the text content from the published documents. These prepared data undergo pre-processing for removing the punctuations, stop words, and lowercase conversion. The features are extracted from these pre-processed data utilizing the Term Frequency-Inverse Document Frequency (TF-IDF) approach for extracting the keywords. The extracted features undergo the final clustering phase employing the spectral clustering algorithm, in which the parameter tuning has been done by the nature inspired optimization algorithm referred as Particle Swarm Optimization (PSO) with the consideration of silhouette score maximization as the objective function. This proposed spectral clustering-PSO clusters the final output into six classes such as data mining, deep learning, image, machine learning, network, and sports respectively. The proposed document clustering model describes its betterment over the remaining techniques with respect to distinct measures. The proposed spectral clustering-PSO in terms of silhouette score is 51.92%, 70.81%, 45.93%, and 20.89% better than JA-GWO, tpLDA, HDMA, and Net2Vec respectively. Similarly, the proposed spectral clustering-PSO in terms of davies bouldin score is 89.69%, 58.48%, 32.67%, and 13.99% advanced than JA-GWO, tpLDA, HDMA, and Net2Vec respectively. 
Author T. Elavarasi
Author_xml – sequence: 1
  surname: T. Elavarasi
  fullname: T. Elavarasi
BookMark eNpN0M1OAjEYheHGYCIiF-CuNzDYb9rSzhLH34QEEzQuJ6XzFUvmL23R6NVrwAWrc1bv4rkko67vkJBrYDOZK81vdj5iO_sE5kWcCa3OyDgXc50Jruajk39BpjHuGGM5CCZFPibv6wFtCqahZbOPCYPvttmtiVjTFxOStw3S9ZcJLV0Nybf-xyTfd3TRbPvg00dLXR_oXW_3LXbppHFFzp1pIk7_d0LeHu5fy6dsuXp8LhfLzIKUKuNgVW4NbqxjXBs0KCFnYAuBppBacaawsMJq3DgFtS6cq5UyNbgCBMiaTwgcuzb0MQZ01RB8a8J3Baw64FQHnOqIU_3h8F-t714D
ContentType Journal Article
DBID AAYXX
CITATION
DOI 10.52783/jisem.v10i4s.487
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2468-4376
EndPage 146
ExternalDocumentID 10_52783_jisem_v10i4s_487
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
M~E
OK1
ID FETCH-LOGICAL-c1557-31c72caebcf038aeae51201c94ea9587307e9c4c8ebf71d89ffd77ad1f91415d3
ISSN 2468-4376
IngestDate Sat Nov 29 05:44:43 EST 2025
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 4s
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c1557-31c72caebcf038aeae51201c94ea9587307e9c4c8ebf71d89ffd77ad1f91415d3
OpenAccessLink https://doi.org/10.52783/jisem.v10i4s.487
PageCount 13
ParticipantIDs crossref_primary_10_52783_jisem_v10i4s_487
PublicationCentury 2000
PublicationDate 2025-01-17
PublicationDateYYYYMMDD 2025-01-17
PublicationDate_xml – month: 01
  year: 2025
  text: 2025-01-17
  day: 17
PublicationDecade 2020
PublicationTitle Journal of information systems engineering & management
PublicationYear 2025
SSID ssj0002140542
Score 2.2798753
Snippet The process of automatically grouping documents into clusters such that the documents in one cluster are very comparable to the documents in the remaining...
SourceID crossref
SourceType Index Database
StartPage 134
Title Spectral Clustering-Based Particle Swarm Optimization Algorithm for Document Clustering
Volume 10
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2468-4376
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002140542
  issn: 2468-4376
  databaseCode: M~E
  dateStart: 20160101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1NT9wwELW2tIdyQP1UCxT50FOjbPPhYPu4rEAcKK3EVnCLHNspkTYBJbsLJ347YzvZDVuQyqGXKLKiUZL3NPMymRkj9JXk0rh97mcQe3yS0cBnTGlfh4nOaCwUye3I_BN6esouLvivwWDe9cIsprSq2O0tv_6vUMMagG1aZ58B99IoLMA5gA5HgB2O_wS82VHepC-88XRupiBAbPIPIFYpUIvuau_sRtSl9xO8Rdm2YXqj6Z-ruphdlrbuEELP3FYJrGw8oWLbwavWhpsK3Xh6NePQMqv8q8RmMvQOp2IhatEU_bxDZEr8fNdm6dxTZHq2SEzbQdaPrHX-NejxiDQ9bxm2eUwXeNtc5LpPT8xeIMapF40uh4swKEgzJG2UfjA_ey2uLasN4TvHGkmtidSZSMHEC_Qyogk3zvDH3So1F8FHZ2K3XVo-j_sfbq18X7-RnqLpSZPJG7TVooFHDt23aKCrd2izN2nyPTrvWIHXWYE7VmDLCtxnBV6yAgPKuGNFz8YH9PvocDI-9ts9NXwJytGkpCWNpNCZzIOYCS00KL4glJxowRMG_p5qLolkOstpqBjPc0WpUGHOQ9B6Kv6INqqrSn9CWEmiudGroEKJiqVIQJ0SLTlTZF8EwWf0rXsv6bUbnZI-CcX2cy7eQa9XjNxFG7N6rr-gV3IxK5p6z4J5D5sybtI
linkProvider ISSN International Centre
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Spectral+Clustering-Based+Particle+Swarm+Optimization+Algorithm+for+Document+Clustering&rft.jtitle=Journal+of+information+systems+engineering+%26+management&rft.au=T.+Elavarasi&rft.date=2025-01-17&rft.issn=2468-4376&rft.eissn=2468-4376&rft.volume=10&rft.issue=4s&rft.spage=134&rft.epage=146&rft_id=info:doi/10.52783%2Fjisem.v10i4s.487&rft.externalDBID=n%2Fa&rft.externalDocID=10_52783_jisem_v10i4s_487
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2468-4376&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2468-4376&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2468-4376&client=summon