CQLLM: A Framework for Generating CodeQL Security Vulnerability Detection Code Based on Large Language Model.

Gespeichert in:
Bibliographische Detailangaben
Titel: CQLLM: A Framework for Generating CodeQL Security Vulnerability Detection Code Based on Large Language Model.
Autoren: Wang, Le, Chen, Chan, Zhu, Junyi, Zhan, Rufeng, Han, Weihong
Quelle: Applied Sciences (2076-3417); Jan2026, Vol. 16 Issue 1, p517, 17p
Schlagwörter: COMPUTER security vulnerabilities, LANGUAGE models, COMPUTER software correctness, SOFTWARE frameworks, PENETRATION testing (Computer security), CODE generators
Abstract: With the increasing complexity of software systems, the number of security vulnerabilities contained within software has risen accordingly. The existing shift-left security concept aims to detect and fix vulnerabilities during the software development cycle. While CodeQL stands as the premier static code analysis tool currently available on the market, its high barrier to entry poses challenges for meeting the implementation requirements of shift-left security initiatives. While large language model (LLM) offers potential assistance in QL code development, the inherent complexity of code generation tasks often leads to persistent issues such as syntactic inaccuracies and references to non-existent modules, which consequently constrains their practical applicability in this domain. To address these challenges, this paper proposes CQLLM (CodeQL-enhanced Large Language Model), a novel framework for automating the generation of CodeQL security vulnerability detection code by leveraging LLM. This framework is designed to enhance both the efficiency and the accuracy of automated QL code generation, thereby advancing static code analysis for a more efficient and intelligent paradigm for vulnerability detection. First, retrieval-augmented generation (RAG) is employed to search the vector database for dependency libraries and code snippets that are highly similar to the user's input, thereby constraining the model's generation process and preventing the import of invalid modules. Then, the user input and the knowledge chunks retrieved by RAG are fed into a fine-tuned LLM to perform reasoning and generate QL code. By integrating external knowledge bases with the large model, the framework enhances the correctness and completeness of the generated code. Experimental results show that CQLLM significantly improves the executability of the generated QL code, with the execution success rate improving from 0.31% to 72.48%, outperforming the original model by a large margin. Meanwhile, CQLLM also enhances the effectiveness of the generated results, achieving a CWE (Common Weakness Enumeration) coverage rate of 57.4% in vulnerability detection tasks, demonstrating its practical applicability in real-world vulnerability detection. [ABSTRACT FROM AUTHOR]
Copyright of Applied Sciences (2076-3417) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Datenbank: Complementary Index
FullText Text:
  Availability: 0
CustomLinks:
  – Url: https://resolver.ebscohost.com/openurl?sid=EBSCO:edb&genre=article&issn=20763417&ISBN=&volume=16&issue=1&date=20260101&spage=517&pages=517-533&title=Applied Sciences (2076-3417)&atitle=CQLLM%3A%20A%20Framework%20for%20Generating%20CodeQL%20Security%20Vulnerability%20Detection%20Code%20Based%20on%20Large%20Language%20Model.&aulast=Wang%2C%20Le&id=DOI:10.3390/app16010517
    Name: Full Text Finder
    Category: fullText
    Text: Full Text Finder
    Icon: https://imageserver.ebscohost.com/branding/images/FTF.gif
    MouseOverText: Full Text Finder
  – Url: https://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=EBSCO&SrcAuth=EBSCO&DestApp=WOS&ServiceName=TransferToWoS&DestLinkType=GeneralSearchSummary&Func=Links&author=Wang%20L
    Name: ISI
    Category: fullText
    Text: Nájsť tento článok vo Web of Science
    Icon: https://imagesrvr.epnet.com/ls/20docs.gif
    MouseOverText: Nájsť tento článok vo Web of Science
Header DbId: edb
DbLabel: Complementary Index
An: 190819838
RelevancyScore: 1082
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 1082.40466308594
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: CQLLM: A Framework for Generating CodeQL Security Vulnerability Detection Code Based on Large Language Model.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Wang%2C+Le%22">Wang, Le</searchLink><br /><searchLink fieldCode="AR" term="%22Chen%2C+Chan%22">Chen, Chan</searchLink><br /><searchLink fieldCode="AR" term="%22Zhu%2C+Junyi%22">Zhu, Junyi</searchLink><br /><searchLink fieldCode="AR" term="%22Zhan%2C+Rufeng%22">Zhan, Rufeng</searchLink><br /><searchLink fieldCode="AR" term="%22Han%2C+Weihong%22">Han, Weihong</searchLink>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: Applied Sciences (2076-3417); Jan2026, Vol. 16 Issue 1, p517, 17p
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22COMPUTER+security+vulnerabilities%22">COMPUTER security vulnerabilities</searchLink><br /><searchLink fieldCode="DE" term="%22LANGUAGE+models%22">LANGUAGE models</searchLink><br /><searchLink fieldCode="DE" term="%22COMPUTER+software+correctness%22">COMPUTER software correctness</searchLink><br /><searchLink fieldCode="DE" term="%22SOFTWARE+frameworks%22">SOFTWARE frameworks</searchLink><br /><searchLink fieldCode="DE" term="%22PENETRATION+testing+%28Computer+security%29%22">PENETRATION testing (Computer security)</searchLink><br /><searchLink fieldCode="DE" term="%22CODE+generators%22">CODE generators</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: With the increasing complexity of software systems, the number of security vulnerabilities contained within software has risen accordingly. The existing shift-left security concept aims to detect and fix vulnerabilities during the software development cycle. While CodeQL stands as the premier static code analysis tool currently available on the market, its high barrier to entry poses challenges for meeting the implementation requirements of shift-left security initiatives. While large language model (LLM) offers potential assistance in QL code development, the inherent complexity of code generation tasks often leads to persistent issues such as syntactic inaccuracies and references to non-existent modules, which consequently constrains their practical applicability in this domain. To address these challenges, this paper proposes CQLLM (CodeQL-enhanced Large Language Model), a novel framework for automating the generation of CodeQL security vulnerability detection code by leveraging LLM. This framework is designed to enhance both the efficiency and the accuracy of automated QL code generation, thereby advancing static code analysis for a more efficient and intelligent paradigm for vulnerability detection. First, retrieval-augmented generation (RAG) is employed to search the vector database for dependency libraries and code snippets that are highly similar to the user's input, thereby constraining the model's generation process and preventing the import of invalid modules. Then, the user input and the knowledge chunks retrieved by RAG are fed into a fine-tuned LLM to perform reasoning and generate QL code. By integrating external knowledge bases with the large model, the framework enhances the correctness and completeness of the generated code. Experimental results show that CQLLM significantly improves the executability of the generated QL code, with the execution success rate improving from 0.31% to 72.48%, outperforming the original model by a large margin. Meanwhile, CQLLM also enhances the effectiveness of the generated results, achieving a CWE (Common Weakness Enumeration) coverage rate of 57.4% in vulnerability detection tasks, demonstrating its practical applicability in real-world vulnerability detection. [ABSTRACT FROM AUTHOR]
– Name: Abstract
  Label:
  Group: Ab
  Data: <i>Copyright of Applied Sciences (2076-3417) is the property of MDPI and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://erproxy.cvtisr.sk/sfx/access?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=edb&AN=190819838
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.3390/app16010517
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 17
        StartPage: 517
    Subjects:
      – SubjectFull: COMPUTER security vulnerabilities
        Type: general
      – SubjectFull: LANGUAGE models
        Type: general
      – SubjectFull: COMPUTER software correctness
        Type: general
      – SubjectFull: SOFTWARE frameworks
        Type: general
      – SubjectFull: PENETRATION testing (Computer security)
        Type: general
      – SubjectFull: CODE generators
        Type: general
    Titles:
      – TitleFull: CQLLM: A Framework for Generating CodeQL Security Vulnerability Detection Code Based on Large Language Model.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Wang, Le
      – PersonEntity:
          Name:
            NameFull: Chen, Chan
      – PersonEntity:
          Name:
            NameFull: Zhu, Junyi
      – PersonEntity:
          Name:
            NameFull: Zhan, Rufeng
      – PersonEntity:
          Name:
            NameFull: Han, Weihong
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Text: Jan2026
              Type: published
              Y: 2026
          Identifiers:
            – Type: issn-print
              Value: 20763417
          Numbering:
            – Type: volume
              Value: 16
            – Type: issue
              Value: 1
          Titles:
            – TitleFull: Applied Sciences (2076-3417)
              Type: main
ResultId 1