AIGCodeSet: Yapay Zeka Üretimli Kod Tespiti İçin Yeni Bir Veri Kümesi

Saved in:
Bibliographic Details
Title: AIGCodeSet: Yapay Zeka Üretimli Kod Tespiti İçin Yeni Bir Veri Kümesi
Authors: Demirok, Basak, Kutlu, Mucahid
Publisher Information: Institute of Electrical and Electronics Engineers Inc.
Publication Year: 2025
Subject Terms: Software Design, Student Assignments, Large Language Model, Language Model, Job Interviews, Intelligence Models, Ethical Issues, Critical Issues, Code Detection, Artificial Intelligence Generated Code Detection, Annotated Datasets, AI Generated Code Detection, Python, Ethical Technology, Computer Software, Codes (Symbols), Bayesian Networks, Artificial Intelligence, Large Language Models
Description: Isik University ; While large language models provide significant convenience for software development, they can lead to ethical issues in job interviews and student assignments. Therefore, determining whether a piece of code is written by a human or generated by an artificial intelligence (AI) model is a critical issue. In this study, we present AIGCodeSet, which consists of 2.828 AI-generated and 4.755 human-written Python codes, created using CodeLlama, Codestral, and Gemini. In addition, we share the results of our experiments conducted with baseline detection methods. Our experiments show that a Bayesian classifier outperforms the other models. © 2025 Elsevier B.V., All rights reserved.
Document Type: conference object
Language: Turkish
Relation: 33rd IEEE Conference on Signal Processing and Communications Applications, SIU 2025 -- Istanbul; Isik University Sile Campus -- 211450; Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı; https://doi.org/10.1109/SIU66497.2025.11112334; https://hdl.handle.net/20.500.11851/12721; N/A
DOI: 10.1109/SIU66497.2025.11112334
Availability: https://hdl.handle.net/20.500.11851/12721
https://doi.org/10.1109/SIU66497.2025.11112334
Rights: none
Accession Number: edsbas.6BB6B97D
Database: BASE
Description
Abstract:Isik University ; While large language models provide significant convenience for software development, they can lead to ethical issues in job interviews and student assignments. Therefore, determining whether a piece of code is written by a human or generated by an artificial intelligence (AI) model is a critical issue. In this study, we present AIGCodeSet, which consists of 2.828 AI-generated and 4.755 human-written Python codes, created using CodeLlama, Codestral, and Gemini. In addition, we share the results of our experiments conducted with baseline detection methods. Our experiments show that a Bayesian classifier outperforms the other models. © 2025 Elsevier B.V., All rights reserved.
DOI:10.1109/SIU66497.2025.11112334