Using pseudo-AI submissions for detecting AI-generated code

IntroductionGenerative AI tools can produce programming code that looks very similar to human-written code, which creates challenges in programming education. Students may use these tools inappropriately for their programming assignments, and there currently are not reliable methods to detect AI-gen...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Frontiers in computer science (Lausanne) Ročník 7
Hlavní autor: Bashir, Shariq
Médium: Journal Article
Jazyk:angličtina
Vydáno: Frontiers Media S.A 23.05.2025
Témata:
ISSN:2624-9898, 2624-9898
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract IntroductionGenerative AI tools can produce programming code that looks very similar to human-written code, which creates challenges in programming education. Students may use these tools inappropriately for their programming assignments, and there currently are not reliable methods to detect AI-generated code. It is important to address this issue to protect academic integrity while allowing the constructive use of AI tools. Previous studies have explored ways to detect AI-generated text, such as analyzing structural differences, embedding watermarks, examining specific features, or using fine-tuned language models. However, certain techniques, like prompt engineering, can make AI-generated code harder to identify.MethodsTo tackle this problem, this article suggests a new approach for instructors to handle programming assignment integrity. The idea is for instructors to use generative AI tools themselves to create example AI-generated submissions (pseudo-AI submissions) for each task. These pseudo-AI submissions, shared along with the task instructions, act as reference solutions for students. In the presence of pseudo-AI submissions, students are made aware that submissions resembling these examples are easily identifiable and will likely be flagged for lack of originality. On one side, this transparency removes the perceived advantage of using generative AI tools to complete assignments, as their output would closely match the provided examples, making it obvious to instructors. On the other side, the presence of these pseudo-AI submissions reinforces the expectation for students to produce unique and personalized work, motivating them to engage more deeply with the material and rely on their own problem-solving skills.ResultsA user study indicates that this method can detect AI-generated code with over 96% accuracy.DiscussionThe analysis of results shows that pseudo-AI submissions created using AI tools do not closely resemble student-written code, suggesting that the framework does not hinder students from writing their own unique solutions. Differences in areas such as expression assignments, use of language features, readability, efficiency, conciseness, and clean coding practices further distinguish pseudo-AI submissions from student work.
AbstractList IntroductionGenerative AI tools can produce programming code that looks very similar to human-written code, which creates challenges in programming education. Students may use these tools inappropriately for their programming assignments, and there currently are not reliable methods to detect AI-generated code. It is important to address this issue to protect academic integrity while allowing the constructive use of AI tools. Previous studies have explored ways to detect AI-generated text, such as analyzing structural differences, embedding watermarks, examining specific features, or using fine-tuned language models. However, certain techniques, like prompt engineering, can make AI-generated code harder to identify.MethodsTo tackle this problem, this article suggests a new approach for instructors to handle programming assignment integrity. The idea is for instructors to use generative AI tools themselves to create example AI-generated submissions (pseudo-AI submissions) for each task. These pseudo-AI submissions, shared along with the task instructions, act as reference solutions for students. In the presence of pseudo-AI submissions, students are made aware that submissions resembling these examples are easily identifiable and will likely be flagged for lack of originality. On one side, this transparency removes the perceived advantage of using generative AI tools to complete assignments, as their output would closely match the provided examples, making it obvious to instructors. On the other side, the presence of these pseudo-AI submissions reinforces the expectation for students to produce unique and personalized work, motivating them to engage more deeply with the material and rely on their own problem-solving skills.ResultsA user study indicates that this method can detect AI-generated code with over 96% accuracy.DiscussionThe analysis of results shows that pseudo-AI submissions created using AI tools do not closely resemble student-written code, suggesting that the framework does not hinder students from writing their own unique solutions. Differences in areas such as expression assignments, use of language features, readability, efficiency, conciseness, and clean coding practices further distinguish pseudo-AI submissions from student work.
Author Bashir, Shariq
Author_xml – sequence: 1
  givenname: Shariq
  surname: Bashir
  fullname: Bashir, Shariq
BookMark eNpNkMtqwzAUREVJoWmaH-jKP-BUVw9bpqsQ-jAEumnW4ka6Cg6JFSRn0b9vXpSuZhiGsziPbNTHnhh7Bj6T0jQvwcX9YSa40DPQqqkruGNjUQlVNqYxo3_9gU1z3nJ-ugJoU4_Z6yp3_aY4ZDr6WM7bIh_X-y7nLva5CDEVngZyw_kzb8sN9ZRwIF-46OmJ3QfcZZrecsJW72_fi89y-fXRLubL0kngQ4muUYGCdwqlQMRKkdEeQzBQa2PIaYN1kI2ENSjvBAjjgleEoVYKOMkJa69cH3FrD6nbY_qxETt7GWLaWExD53ZkNTk0zUmK0VoBoFGirpTQXEhdg3EnlriyXIo5Jwp_POD2bNNebNqzTXuzKX8BbVdp1g
Cites_doi 10.1145/3545945.3569759
10.1609/aaai.v38i21.30527
10.1016/j.tsc.2024.101522
10.1145/3624720
10.1109/SP46214.2022.9833571
10.48550/arXiv.2404.14459
10.18653/v1/2020.emnlp-main.673
10.1145/3580305.3599790
10.1016/j.mlwa.2024.100526
10.48550/arXiv.2307.07411
10.3217/jucs-008-11-1016
10.1145/3623476.3623522
10.1111/jcal.12662
10.48550/arXiv.2203.13474
10.1007/s40593-024-00406-0
10.58496/MJCSC/2023/002
10.5555/3648699.3648939
10.48550/arXiv.2211.04715
10.5555/3495724.3495883
10.1016/j.softx.2024.101755
10.1145/3626252.3630826
10.1016/j.jss.2024.112059
10.1109/ICCSNT.2012.6526164
10.48550/arXiv.1908.09203
10.1609/aaai.v38i21.30361
ContentType Journal Article
DBID AAYXX
CITATION
DOA
DOI 10.3389/fcomp.2025.1549761
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2624-9898
ExternalDocumentID oai_doaj_org_article_5eca89389855411a842764250235718c
10_3389_fcomp_2025_1549761
GroupedDBID 9T4
AAFWJ
AAYXX
ADMLS
AFPKN
ALMA_UNASSIGNED_HOLDINGS
CITATION
GROUPED_DOAJ
M~E
OK1
ID FETCH-LOGICAL-c310t-ac94fefdc4a32aaa64e85daff817588ec58a7f3931b14dc2128cfd4eaf74410e3
IEDL.DBID DOA
ISICitedReferencesCount 0
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001503836000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 2624-9898
IngestDate Fri Oct 03 12:53:24 EDT 2025
Sat Nov 29 07:53:44 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c310t-ac94fefdc4a32aaa64e85daff817588ec58a7f3931b14dc2128cfd4eaf74410e3
OpenAccessLink https://doaj.org/article/5eca89389855411a842764250235718c
ParticipantIDs doaj_primary_oai_doaj_org_article_5eca89389855411a842764250235718c
crossref_primary_10_3389_fcomp_2025_1549761
PublicationCentury 2000
PublicationDate 2025-05-23
PublicationDateYYYYMMDD 2025-05-23
PublicationDate_xml – month: 05
  year: 2025
  text: 2025-05-23
  day: 23
PublicationDecade 2020
PublicationTitle Frontiers in computer science (Lausanne)
PublicationYear 2025
Publisher Frontiers Media S.A
Publisher_xml – name: Frontiers Media S.A
References Mitchell (B18) 2023
Maertens (B17) 2022; 38
Orenstrakh (B21) 2023
Bucaioni (B4) 2024; 15
Kechao (B13) 2012
Sarsa (B25) 2022
Ljubovic (B15) 2020
Ghimire (B9) 2024
Kirchenbauer (B14) 2023
Jukiewicz (B12) 2024; 52
Xu (B29) 2024; 38
Chowdhery (B5) 2023; 24
Estévez-Ayres (B8) 2024
Nijkamp (B20) 2022
Zheng (B32) 2023
Biswas (B2) 2023; 2023
Solaiman (B26) 2019
Prechelt (B23) 2002; 8
Hoq (B10) 2024
Uchendu (B28) 2020
Huang (B11) 2019
Pearce (B22) 2022
Tóth (B27) 2024
Xu (B30) 2024; 38
Maertens (B16) 2024; 26
Ribeiro (B24) 2023
Nguyen (B19) 2024; 214
Becker (B1) 2023
Zellers (B31) 2019
Denny (B6) 2024; 67
Denny (B7) 2022
Brown (B3) 2020; 33
References_xml – start-page: 17061
  volume-title: International Conference on Machine Learning
  year: 2023
  ident: B14
  article-title: “A watermark for large language models,”
– start-page: 500
  volume-title: Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1
  year: 2023
  ident: B1
  article-title: “Programming is hard-or at least it used to be: educational opportunities and challenges of AI code generation,”
  doi: 10.1145/3545945.3569759
– start-page: 269
  volume-title: 2019 International Conference on Computer, Network, Communication and Information Systems (CNCI 2019)
  year: 2019
  ident: B11
  article-title: “An approach of suspected code plagiarism detection based on xgboost incremental learning,”
– volume: 38
  start-page: 23688
  year: 2024
  ident: B30
  article-title: Chatgpt-generated code assignment detection using perplexity of large language models (student abstract)
  publication-title: Proc. AAAI Conf. Artif. Intell
  doi: 10.1609/aaai.v38i21.30527
– volume: 52
  start-page: 101522
  year: 2024
  ident: B12
  article-title: The future of grading programming assignments in education: the role of chatgpt in automating the assessment and feedback process
  publication-title: Think. Skills Creat
  doi: 10.1016/j.tsc.2024.101522
– volume: 67
  start-page: 56
  year: 2024
  ident: B6
  article-title: Computing education in the era of generative AI
  publication-title: Commun. ACM
  doi: 10.1145/3624720
– start-page: 754
  volume-title: 2022 IEEE Symposium on Security and Privacy (SP)
  year: 2022
  ident: B22
  article-title: “Asleep at the keyboard? Assessing the security of github copilot's code contributions,”
  doi: 10.1109/SP46214.2022.9833571
– year: 2024
  ident: B27
  article-title: LLMs in web-development: evaluating LLM-generated PHP code unveiling vulnerabilities and limitations
  publication-title: arXiv
  doi: 10.48550/arXiv.2404.14459
– start-page: 8384
  volume-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
  year: 2020
  ident: B28
  article-title: “Authorship attribution for neural text generation,”
  doi: 10.18653/v1/2020.emnlp-main.673
– start-page: 5673
  volume-title: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
  year: 2023
  ident: B32
  article-title: “CodeGeeX: a pre-trained model for code generation with multilingual evaluations on humaneval-x,”
  doi: 10.1145/3580305.3599790
– start-page: 24950
  volume-title: International Conference on Machine Learning
  year: 2023
  ident: B18
  article-title: “Detectgpt: zero-shot machine-generated text detection using probability curvature,”
– volume: 15
  start-page: 100526
  year: 2024
  ident: B4
  article-title: Programming with chatgpt: how far can we go?
  publication-title: Mach. Learn. Appl
  doi: 10.1016/j.mlwa.2024.100526
– start-page: 259
  volume-title: International Conference on Artificial Intelligence in Education
  year: 2024
  ident: B9
  article-title: “Coding with AI: how are tools like chatgpt being used by students in foundational programming courses,”
– year: 2023
  ident: B21
  article-title: Detecting LLM-generated text in computing education: a comparative study for chatgpt cases
  publication-title: arXiv
  doi: 10.48550/arXiv.2307.07411
– volume: 8
  start-page: 1016
  year: 2002
  ident: B23
  article-title: Finding plagiarisms among a set of programs with JPLAG
  publication-title: J. Univers. Comput. Sci
  doi: 10.3217/jucs-008-11-1016
– start-page: 111
  volume-title: Proceedings of the 16th ACM SIGPLAN International Conference on Software Language Engineering
  year: 2023
  ident: B24
  article-title: “GPT-3-powered type error debugging: investigating the use of large language models for code repair,”
  doi: 10.1145/3623476.3623522
– volume: 38
  start-page: 1046
  year: 2022
  ident: B17
  article-title: Dolos: language-agnostic plagiarism detection in source code
  publication-title: J. Comput. Assist. Learn
  doi: 10.1111/jcal.12662
– year: 2022
  ident: B20
  article-title: Codegen: an open large language model for code with multi-turn program synthesis
  publication-title: arXiv
  doi: 10.48550/arXiv.2203.13474
– year: 2024
  ident: B8
  article-title: Evaluation of LLM tools for feedback generation in a course on concurrent programming
  publication-title: Int. J. Arti. Intell. Educ
  doi: 10.1007/s40593-024-00406-0
– volume: 2023
  start-page: 9
  year: 2023
  ident: B2
  article-title: Role of chatgpt in computer programming
  publication-title: Mesopotam. J. Comput. Sci
  doi: 10.58496/MJCSC/2023/002
– year: 2019
  ident: B31
  article-title: Defending against neural fake news
  publication-title: Adv. Neural Inf. Process. Syst
– volume: 24
  start-page: 1
  year: 2023
  ident: B5
  article-title: Palm: scaling language modeling with pathways
  publication-title: J. Mach. Learn. Res
  doi: 10.5555/3648699.3648939
– start-page: 27
  volume-title: Proceedings of the 2022 ACM Conference on International Computing Education Research, Vol. 1
  year: 2022
  ident: B25
  article-title: “Automatic generation of programming exercises and code explanations using large language models,”
– year: 2022
  ident: B7
  article-title: Robosourcing educational resources-leveraging large language models for learnersourcing
  publication-title: arXiv
  doi: 10.48550/arXiv.2211.04715
– volume: 33
  start-page: 1877
  year: 2020
  ident: B3
  article-title: Language models are few-shot learners
  publication-title: Adv. Neural Inf. Process Syst
  doi: 10.5555/3495724.3495883
– volume: 26
  start-page: 101755
  year: 2024
  ident: B16
  article-title: Discovering and exploring cases of educational source code plagiarism with dolos
  publication-title: SoftwareX
  doi: 10.1016/j.softx.2024.101755
– start-page: 526
  volume-title: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1
  year: 2024
  ident: B10
  article-title: “Detecting chatgpt-generated code submissions in a cs1 course using machine learning models,”
  doi: 10.1145/3626252.3630826
– volume: 214
  start-page: 112059
  year: 2024
  ident: B19
  article-title: Gptsniffer: a codebert-based classifier to detect source code written by chatgpt
  publication-title: J. Syst. Softw
  doi: 10.1016/j.jss.2024.112059
– volume-title: Programming Homework Dataset for Plagiarism Detection
  year: 2020
  ident: B15
– start-page: 1318
  volume-title: Proceedings of 2012 2nd International Conference on Computer Science and Network Technology
  year: 2012
  ident: B13
  article-title: “Detection of plagiarism in students' programs using a data mining algorithm,”
  doi: 10.1109/ICCSNT.2012.6526164
– year: 2019
  ident: B26
  article-title: Release strategies and the social impacts of language models
  publication-title: arXiv
  doi: 10.48550/arXiv.1908.09203
– volume: 38
  start-page: 23155
  year: 2024
  ident: B29
  article-title: Detecting AI-generated code assignments using perplexity of large language models
  publication-title: Proc. AAAI Conf. Artif. Intell
  doi: 10.1609/aaai.v38i21.30361
SSID ssj0002511587
Score 2.29227
Snippet IntroductionGenerative AI tools can produce programming code that looks very similar to human-written code, which creates challenges in programming education....
SourceID doaj
crossref
SourceType Open Website
Index Database
SubjectTerms detecting AI-generated code
generative AI tools
large language models (LLMs)
programming code plagiarism detection
programming code similarity
Title Using pseudo-AI submissions for detecting AI-generated code
URI https://doaj.org/article/5eca89389855411a842764250235718c
Volume 7
WOSCitedRecordID wos001503836000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  customDbUrl:
  eissn: 2624-9898
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002511587
  issn: 2624-9898
  databaseCode: DOA
  dateStart: 20190101
  isFulltext: true
  titleUrlDefault: https://www.doaj.org/
  providerName: Directory of Open Access Journals
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2624-9898
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0002511587
  issn: 2624-9898
  databaseCode: M~E
  dateStart: 20190101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELVQxcDCN6J8yQMbslrHduKIqaBWdKBiAKlb5PgDdUmrpmXsb-fOSVGZWFgyRJZlPTt-7-LzO0Lued9waU3KhNMpk2WKSQBWMyOcdTpwiJvLWGwim0z0dJq_7ZT6wpywxh64Aa6nvDXAqTrHfCrOjZZJBppZRZ8Wri3uvqB6doIp3INROCudNbdkIArLewFTtCEeTBT-TAES5r-YaMewPzLL6JgctpKQDpqhnJA9X52So225Bdp-fWfkMR7v00Xt127OBmNaA5fNMIu1qiloT-o8nghgm8GYfUY_adCTFG-tn5OP0fD9-YW1tQ-YBcG1YsbmMvjgrDQiMcak0mvlTAga-F5rb5U2WRC54CWXzgIBaRuc9CZkIHD6XlyQTjWv_CWhSnmXWOWhP9gWQ1IKTIQRgGuKbiyqSx62OBSLxuKigNAAUSsiagWiVrSodckTQvXTEu2p4wuYtKKdtOKvSbv6j06uyQEODI_yE3FDOqvl2t-Sffu1mtXLu7ge4Pm6GX4DamC3yg
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Using+pseudo-AI+submissions+for+detecting+AI-generated+code&rft.jtitle=Frontiers+in+computer+science+%28Lausanne%29&rft.au=Bashir%2C+Shariq&rft.date=2025-05-23&rft.issn=2624-9898&rft.eissn=2624-9898&rft.volume=7&rft_id=info:doi/10.3389%2Ffcomp.2025.1549761&rft.externalDBID=n%2Fa&rft.externalDocID=10_3389_fcomp_2025_1549761
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2624-9898&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2624-9898&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2624-9898&client=summon