Optimizing forensic file classification: enhancing SFCS with βk hyperparameter tuning.
Gespeichert in:
| Titel: | Optimizing forensic file classification: enhancing SFCS with β |
|---|---|
| Autoren: | Joseph, D. Paul, Perumal, Viswanathan |
| Quelle: | PeerJ Computer Science; Mar2025, p1-27, 27p |
| Schlagwörter: | FORENSIC sciences, CLASSIFICATION, CONCEPTUAL models, INFORMATION storage & retrieval systems, TEXT mining, INFORMATION retrieval, MACHINE learning |
| Abstract: | In forensic topical modelling, the α parameter controls the distribution of topics in documents. However, low, high, or incorrect values of α lead to topic sparsity, model overfitting, and suboptimal topic distribution. To control the word distribution across topics, the β parameter is introduced. However, low, high, or inappropriate β values lead to sparse distribution, disjointed topics, and abundant highly probable words. The β |
| Copyright of PeerJ Computer Science is the property of PeerJ Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Datenbank: | Complementary Index |
| FullText | Text: Availability: 0 CustomLinks: – Url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=pmc&term=2376-5992[TA]+AND+1[PG]+AND+2025[PDAT] Name: FREE - PubMed Central (ISSN based link) Category: fullText Text: Full Text Icon: https://imageserver.ebscohost.com/NetImages/iconPdf.gif MouseOverText: Check this PubMed for the article full text. – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edb&genre=article&issn=23765992&ISBN=&volume=&issue=&date=20250301&spage=1&pages=1-27&title=PeerJ Computer Science&atitle=Optimizing%20forensic%20file%20classification%3A%20enhancing%20SFCS%20with%20%CE%B2k%20hyperparameter%20tuning.&aulast=Joseph%2C%20D.%20Paul&id=DOI:10.7717/peerj-cs.2608 Name: Full Text Finder Category: fullText Text: Full Text Finder Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif MouseOverText: Full Text Finder – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Joseph%20DP Name: ISI Category: fullText Text: Nájsť tento článok vo Web of Science Icon: https://imagesrvr.epnet.com/ls/20docs.gif MouseOverText: Nájsť tento článok vo Web of Science |
|---|---|
| Header | DbId: edb DbLabel: Complementary Index An: 184598751 RelevancyScore: 1023 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 1023.07043457031 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Optimizing forensic file classification: enhancing SFCS with β<subscript>k</subscript> hyperparameter tuning. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Joseph%2C+D%2E+Paul%22">Joseph, D. Paul</searchLink><br /><searchLink fieldCode="AR" term="%22Perumal%2C+Viswanathan%22">Perumal, Viswanathan</searchLink> – Name: TitleSource Label: Source Group: Src Data: PeerJ Computer Science; Mar2025, p1-27, 27p – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22FORENSIC+sciences%22">FORENSIC sciences</searchLink><br /><searchLink fieldCode="DE" term="%22CLASSIFICATION%22">CLASSIFICATION</searchLink><br /><searchLink fieldCode="DE" term="%22CONCEPTUAL+models%22">CONCEPTUAL models</searchLink><br /><searchLink fieldCode="DE" term="%22INFORMATION+storage+%26+retrieval+systems%22">INFORMATION storage & retrieval systems</searchLink><br /><searchLink fieldCode="DE" term="%22TEXT+mining%22">TEXT mining</searchLink><br /><searchLink fieldCode="DE" term="%22INFORMATION+retrieval%22">INFORMATION retrieval</searchLink><br /><searchLink fieldCode="DE" term="%22MACHINE+learning%22">MACHINE learning</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: In forensic topical modelling, the α parameter controls the distribution of topics in documents. However, low, high, or incorrect values of α lead to topic sparsity, model overfitting, and suboptimal topic distribution. To control the word distribution across topics, the β parameter is introduced. However, low, high, or inappropriate β values lead to sparse distribution, disjointed topics, and abundant highly probable words. The β<subscript>j</subscript> parameter, in conjunction with seed-guided words based on Term Frequency and Inverse Document Frequency, is introduced to address the issues. Nevertheless, the data often suffers from skewness or noise due to frequent co-occurrences of unrelated polysemic word pairs generated using Pointwise Mutual Information. By integrating α, β, and β<subscript>j</subscript> into file classification systems, classification models converge to local optima with O(n log n* |V|) time complexity. To combat these challenges, this research proposes the SDOT Forensic Classification System (SFCS) with a functional parameter β<subscript>k</subscript> that identifies seed words by evaluating semantic and contextual similarity of word vectors. As a result, the topic distribution (Θ<subscript>d</subscript>) is compelled to model the curated seed words within the distribution, generating pertinent topics. Incorporating β<subscript>k</subscript> into SFCS allowed the proposed model to remove 278 k irrelevant files from the corpus and identify 5.6 k suspicious files by extracting 700 blacklisted keywords. Furthermore, this research implemented hyperparameter optimization and hyperplane maximization, resulting in a file classification accuracy of 94.6%, 94.4% precision and 96.8% recall within O(n log n) complexity. [ABSTRACT FROM AUTHOR] – Name: Abstract Label: Group: Ab Data: <i>Copyright of PeerJ Computer Science is the property of PeerJ Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edb&AN=184598751 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.7717/peerj-cs.2608 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 27 StartPage: 1 Subjects: – SubjectFull: FORENSIC sciences Type: general – SubjectFull: CLASSIFICATION Type: general – SubjectFull: CONCEPTUAL models Type: general – SubjectFull: INFORMATION storage & retrieval systems Type: general – SubjectFull: TEXT mining Type: general – SubjectFull: INFORMATION retrieval Type: general – SubjectFull: MACHINE learning Type: general Titles: – TitleFull: Optimizing forensic file classification: enhancing SFCS with βk hyperparameter tuning. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Joseph, D. Paul – PersonEntity: Name: NameFull: Perumal, Viswanathan IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 03 Text: Mar2025 Type: published Y: 2025 Identifiers: – Type: issn-print Value: 23765992 Titles: – TitleFull: PeerJ Computer Science Type: main |
| ResultId | 1 |
Full Text Finder
Nájsť tento článok vo Web of Science