Optimizing forensic file classification: enhancing SFCS with βk hyperparameter tuning.

Gespeichert in:
Bibliographische Detailangaben
Titel: Optimizing forensic file classification: enhancing SFCS with βk hyperparameter tuning.
Autoren: Joseph, D. Paul, Perumal, Viswanathan
Quelle: PeerJ Computer Science; Mar2025, p1-27, 27p
Schlagwörter: FORENSIC sciences, CLASSIFICATION, CONCEPTUAL models, INFORMATION storage & retrieval systems, TEXT mining, INFORMATION retrieval, MACHINE learning
Abstract: In forensic topical modelling, the α parameter controls the distribution of topics in documents. However, low, high, or incorrect values of α lead to topic sparsity, model overfitting, and suboptimal topic distribution. To control the word distribution across topics, the β parameter is introduced. However, low, high, or inappropriate β values lead to sparse distribution, disjointed topics, and abundant highly probable words. The βj parameter, in conjunction with seed-guided words based on Term Frequency and Inverse Document Frequency, is introduced to address the issues. Nevertheless, the data often suffers from skewness or noise due to frequent co-occurrences of unrelated polysemic word pairs generated using Pointwise Mutual Information. By integrating α, β, and βj into file classification systems, classification models converge to local optima with O(n log n* |V|) time complexity. To combat these challenges, this research proposes the SDOT Forensic Classification System (SFCS) with a functional parameter βk that identifies seed words by evaluating semantic and contextual similarity of word vectors. As a result, the topic distribution (Θd) is compelled to model the curated seed words within the distribution, generating pertinent topics. Incorporating βk into SFCS allowed the proposed model to remove 278 k irrelevant files from the corpus and identify 5.6 k suspicious files by extracting 700 blacklisted keywords. Furthermore, this research implemented hyperparameter optimization and hyperplane maximization, resulting in a file classification accuracy of 94.6%, 94.4% precision and 96.8% recall within O(n log n) complexity. [ABSTRACT FROM AUTHOR]
Copyright of PeerJ Computer Science is the property of PeerJ Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Datenbank: Complementary Index
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=pmc&term=2376-5992[TA]+AND+1[PG]+AND+2025[PDAT]
    Name: FREE - PubMed Central (ISSN based link)
    Category: fullText
    Text: Full Text
    Icon: https://imageserver.ebscohost.com/NetImages/iconPdf.gif
    MouseOverText: Check this PubMed for the article full text.
  – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edb&genre=article&issn=23765992&ISBN=&volume=&issue=&date=20250301&spage=1&pages=1-27&title=PeerJ Computer Science&atitle=Optimizing%20forensic%20file%20classification%3A%20enhancing%20SFCS%20with%20%CE%B2k%20hyperparameter%20tuning.&aulast=Joseph%2C%20D.%20Paul&id=DOI:10.7717/peerj-cs.2608
    Name: Full Text Finder
    Category: fullText
    Text: Full Text Finder
    Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif
    MouseOverText: Full Text Finder
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Joseph%20DP
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edb
DbLabel: Complementary Index
An: 184598751
RelevancyScore: 1023
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 1023.07043457031
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Optimizing forensic file classification: enhancing SFCS with β<subscript>k</subscript> hyperparameter tuning.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Joseph%2C+D%2E+Paul%22">Joseph, D. Paul</searchLink><br /><searchLink fieldCode="AR" term="%22Perumal%2C+Viswanathan%22">Perumal, Viswanathan</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: PeerJ Computer Science; Mar2025, p1-27, 27p
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22FORENSIC+sciences%22">FORENSIC sciences</searchLink><br /><searchLink fieldCode="DE" term="%22CLASSIFICATION%22">CLASSIFICATION</searchLink><br /><searchLink fieldCode="DE" term="%22CONCEPTUAL+models%22">CONCEPTUAL models</searchLink><br /><searchLink fieldCode="DE" term="%22INFORMATION+storage+%26+retrieval+systems%22">INFORMATION storage & retrieval systems</searchLink><br /><searchLink fieldCode="DE" term="%22TEXT+mining%22">TEXT mining</searchLink><br /><searchLink fieldCode="DE" term="%22INFORMATION+retrieval%22">INFORMATION retrieval</searchLink><br /><searchLink fieldCode="DE" term="%22MACHINE+learning%22">MACHINE learning</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: In forensic topical modelling, the α parameter controls the distribution of topics in documents. However, low, high, or incorrect values of α lead to topic sparsity, model overfitting, and suboptimal topic distribution. To control the word distribution across topics, the β parameter is introduced. However, low, high, or inappropriate β values lead to sparse distribution, disjointed topics, and abundant highly probable words. The β<subscript>j</subscript> parameter, in conjunction with seed-guided words based on Term Frequency and Inverse Document Frequency, is introduced to address the issues. Nevertheless, the data often suffers from skewness or noise due to frequent co-occurrences of unrelated polysemic word pairs generated using Pointwise Mutual Information. By integrating α, β, and β<subscript>j</subscript> into file classification systems, classification models converge to local optima with O(n log n* |V|) time complexity. To combat these challenges, this research proposes the SDOT Forensic Classification System (SFCS) with a functional parameter β<subscript>k</subscript> that identifies seed words by evaluating semantic and contextual similarity of word vectors. As a result, the topic distribution (Θ<subscript>d</subscript>) is compelled to model the curated seed words within the distribution, generating pertinent topics. Incorporating β<subscript>k</subscript> into SFCS allowed the proposed model to remove 278 k irrelevant files from the corpus and identify 5.6 k suspicious files by extracting 700 blacklisted keywords. Furthermore, this research implemented hyperparameter optimization and hyperplane maximization, resulting in a file classification accuracy of 94.6%, 94.4% precision and 96.8% recall within O(n log n) complexity. [ABSTRACT FROM AUTHOR]
– Name: Abstract
  Label:
  Group: Ab
  Data: <i>Copyright of PeerJ Computer Science is the property of PeerJ Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edb&AN=184598751
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.7717/peerj-cs.2608
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 27
        StartPage: 1
    Subjects:
      – SubjectFull: FORENSIC sciences
        Type: general
      – SubjectFull: CLASSIFICATION
        Type: general
      – SubjectFull: CONCEPTUAL models
        Type: general
      – SubjectFull: INFORMATION storage & retrieval systems
        Type: general
      – SubjectFull: TEXT mining
        Type: general
      – SubjectFull: INFORMATION retrieval
        Type: general
      – SubjectFull: MACHINE learning
        Type: general
    Titles:
      – TitleFull: Optimizing forensic file classification: enhancing SFCS with βk hyperparameter tuning.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Joseph, D. Paul
      – PersonEntity:
          Name:
            NameFull: Perumal, Viswanathan
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 03
              Text: Mar2025
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-print
              Value: 23765992
          Titles:
            – TitleFull: PeerJ Computer Science
              Type: main
ResultId 1