Using pseudo-AI submissions for detecting AI-generated code

IntroductionGenerative AI tools can produce programming code that looks very similar to human-written code, which creates challenges in programming education. Students may use these tools inappropriately for their programming assignments, and there currently are not reliable methods to detect AI-gen...

Celý popis

Uloženo v:

Podrobná bibliografie
Vydáno v:	Frontiers in computer science (Lausanne) Ročník 7
Hlavní autor:	Bashir, Shariq
Médium:	Journal Article
Jazyk:	angličtina
Vydáno:	Frontiers Media S.A 23.05.2025
Témata:	detecting AI-generated code generative AI tools large language models (LLMs) programming code plagiarism detection programming code similarity
ISSN:	2624-9898, 2624-9898
On-line přístup:	Získat plný text
Tagy:	Přidat tag Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!

Abstract	IntroductionGenerative AI tools can produce programming code that looks very similar to human-written code, which creates challenges in programming education. Students may use these tools inappropriately for their programming assignments, and there currently are not reliable methods to detect AI-generated code. It is important to address this issue to protect academic integrity while allowing the constructive use of AI tools. Previous studies have explored ways to detect AI-generated text, such as analyzing structural differences, embedding watermarks, examining specific features, or using fine-tuned language models. However, certain techniques, like prompt engineering, can make AI-generated code harder to identify.MethodsTo tackle this problem, this article suggests a new approach for instructors to handle programming assignment integrity. The idea is for instructors to use generative AI tools themselves to create example AI-generated submissions (pseudo-AI submissions) for each task. These pseudo-AI submissions, shared along with the task instructions, act as reference solutions for students. In the presence of pseudo-AI submissions, students are made aware that submissions resembling these examples are easily identifiable and will likely be flagged for lack of originality. On one side, this transparency removes the perceived advantage of using generative AI tools to complete assignments, as their output would closely match the provided examples, making it obvious to instructors. On the other side, the presence of these pseudo-AI submissions reinforces the expectation for students to produce unique and personalized work, motivating them to engage more deeply with the material and rely on their own problem-solving skills.ResultsA user study indicates that this method can detect AI-generated code with over 96% accuracy.DiscussionThe analysis of results shows that pseudo-AI submissions created using AI tools do not closely resemble student-written code, suggesting that the framework does not hinder students from writing their own unique solutions. Differences in areas such as expression assignments, use of language features, readability, efficiency, conciseness, and clean coding practices further distinguish pseudo-AI submissions from student work.
AbstractList	IntroductionGenerative AI tools can produce programming code that looks very similar to human-written code, which creates challenges in programming education. Students may use these tools inappropriately for their programming assignments, and there currently are not reliable methods to detect AI-generated code. It is important to address this issue to protect academic integrity while allowing the constructive use of AI tools. Previous studies have explored ways to detect AI-generated text, such as analyzing structural differences, embedding watermarks, examining specific features, or using fine-tuned language models. However, certain techniques, like prompt engineering, can make AI-generated code harder to identify.MethodsTo tackle this problem, this article suggests a new approach for instructors to handle programming assignment integrity. The idea is for instructors to use generative AI tools themselves to create example AI-generated submissions (pseudo-AI submissions) for each task. These pseudo-AI submissions, shared along with the task instructions, act as reference solutions for students. In the presence of pseudo-AI submissions, students are made aware that submissions resembling these examples are easily identifiable and will likely be flagged for lack of originality. On one side, this transparency removes the perceived advantage of using generative AI tools to complete assignments, as their output would closely match the provided examples, making it obvious to instructors. On the other side, the presence of these pseudo-AI submissions reinforces the expectation for students to produce unique and personalized work, motivating them to engage more deeply with the material and rely on their own problem-solving skills.ResultsA user study indicates that this method can detect AI-generated code with over 96% accuracy.DiscussionThe analysis of results shows that pseudo-AI submissions created using AI tools do not closely resemble student-written code, suggesting that the framework does not hinder students from writing their own unique solutions. Differences in areas such as expression assignments, use of language features, readability, efficiency, conciseness, and clean coding practices further distinguish pseudo-AI submissions from student work.
Author	Bashir, Shariq
Author_xml	– sequence: 1 givenname: Shariq surname: Bashir fullname: Bashir, Shariq
BookMark	eNpNkMtqwzAUREVJoWmaH-jKP-BUVw9bpqsQ-jAEumnW4ka6Cg6JFSRn0b9vXpSuZhiGsziPbNTHnhh7Bj6T0jQvwcX9YSa40DPQqqkruGNjUQlVNqYxo3_9gU1z3nJ-ugJoU4_Z6yp3_aY4ZDr6WM7bIh_X-y7nLva5CDEVngZyw_kzb8sN9ZRwIF-46OmJ3QfcZZrecsJW72_fi89y-fXRLubL0kngQ4muUYGCdwqlQMRKkdEeQzBQa2PIaYN1kI2ENSjvBAjjgleEoVYKOMkJa69cH3FrD6nbY_qxETt7GWLaWExD53ZkNTk0zUmK0VoBoFGirpTQXEhdg3EnlriyXIo5Jwp_POD2bNNebNqzTXuzKX8BbVdp1g
Cites_doi	10.1145/3545945.3569759 10.1609/aaai.v38i21.30527 10.1016/j.tsc.2024.101522 10.1145/3624720 10.1109/SP46214.2022.9833571 10.48550/arXiv.2404.14459 10.18653/v1/2020.emnlp-main.673 10.1145/3580305.3599790 10.1016/j.mlwa.2024.100526 10.48550/arXiv.2307.07411 10.3217/jucs-008-11-1016 10.1145/3623476.3623522 10.1111/jcal.12662 10.48550/arXiv.2203.13474 10.1007/s40593-024-00406-0 10.58496/MJCSC/2023/002 10.5555/3648699.3648939 10.48550/arXiv.2211.04715 10.5555/3495724.3495883 10.1016/j.softx.2024.101755 10.1145/3626252.3630826 10.1016/j.jss.2024.112059 10.1109/ICCSNT.2012.6526164 10.48550/arXiv.1908.09203 10.1609/aaai.v38i21.30361
ContentType	Journal Article
DBID	AAYXX CITATION DOA
DOI	10.3389/fcomp.2025.1549761
DatabaseName	CrossRef DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2624-9898
ExternalDocumentID	oai_doaj_org_article_5eca89389855411a842764250235718c 10_3389_fcomp_2025_1549761
GroupedDBID	9T4 AAFWJ AAYXX ADMLS AFPKN ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ M~E OK1
ID	FETCH-LOGICAL-c310t-ac94fefdc4a32aaa64e85daff817588ec58a7f3931b14dc2128cfd4eaf74410e3
IEDL.DBID	DOA
ISICitedReferencesCount	0
ISICitedReferencesURI	http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001503836000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN	2624-9898
IngestDate	Fri Oct 03 12:53:24 EDT 2025 Sat Nov 29 07:53:44 EST 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c310t-ac94fefdc4a32aaa64e85daff817588ec58a7f3931b14dc2128cfd4eaf74410e3
OpenAccessLink	https://doaj.org/article/5eca89389855411a842764250235718c
ParticipantIDs	doaj_primary_oai_doaj_org_article_5eca89389855411a842764250235718c crossref_primary_10_3389_fcomp_2025_1549761
PublicationCentury	2000
PublicationDate	2025-05-23
PublicationDateYYYYMMDD	2025-05-23
PublicationDate_xml	– month: 05 year: 2025 text: 2025-05-23 day: 23
PublicationDecade	2020
PublicationTitle	Frontiers in computer science (Lausanne)
PublicationYear	2025
Publisher	Frontiers Media S.A
Publisher_xml	– name: Frontiers Media S.A
References	Mitchell (B18) 2023 Maertens (B17) 2022; 38 Orenstrakh (B21) 2023 Bucaioni (B4) 2024; 15 Kechao (B13) 2012 Sarsa (B25) 2022 Ljubovic (B15) 2020 Ghimire (B9) 2024 Kirchenbauer (B14) 2023 Jukiewicz (B12) 2024; 52 Xu (B29) 2024; 38 Chowdhery (B5) 2023; 24 Estévez-Ayres (B8) 2024 Nijkamp (B20) 2022 Zheng (B32) 2023 Biswas (B2) 2023; 2023 Solaiman (B26) 2019 Prechelt (B23) 2002; 8 Hoq (B10) 2024 Uchendu (B28) 2020 Huang (B11) 2019 Pearce (B22) 2022 Tóth (B27) 2024 Xu (B30) 2024; 38 Maertens (B16) 2024; 26 Ribeiro (B24) 2023 Nguyen (B19) 2024; 214 Becker (B1) 2023 Zellers (B31) 2019 Denny (B6) 2024; 67 Denny (B7) 2022 Brown (B3) 2020; 33
References_xml	– start-page: 17061 volume-title: International Conference on Machine Learning year: 2023 ident: B14 article-title: “A watermark for large language models,” – start-page: 500 volume-title: Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 year: 2023 ident: B1 article-title: “Programming is hard-or at least it used to be: educational opportunities and challenges of AI code generation,” doi: 10.1145/3545945.3569759 – start-page: 269 volume-title: 2019 International Conference on Computer, Network, Communication and Information Systems (CNCI 2019) year: 2019 ident: B11 article-title: “An approach of suspected code plagiarism detection based on xgboost incremental learning,” – volume: 38 start-page: 23688 year: 2024 ident: B30 article-title: Chatgpt-generated code assignment detection using perplexity of large language models (student abstract) publication-title: Proc. AAAI Conf. Artif. Intell doi: 10.1609/aaai.v38i21.30527 – volume: 52 start-page: 101522 year: 2024 ident: B12 article-title: The future of grading programming assignments in education: the role of chatgpt in automating the assessment and feedback process publication-title: Think. Skills Creat doi: 10.1016/j.tsc.2024.101522 – volume: 67 start-page: 56 year: 2024 ident: B6 article-title: Computing education in the era of generative AI publication-title: Commun. ACM doi: 10.1145/3624720 – start-page: 754 volume-title: 2022 IEEE Symposium on Security and Privacy (SP) year: 2022 ident: B22 article-title: “Asleep at the keyboard? Assessing the security of github copilot's code contributions,” doi: 10.1109/SP46214.2022.9833571 – year: 2024 ident: B27 article-title: LLMs in web-development: evaluating LLM-generated PHP code unveiling vulnerabilities and limitations publication-title: arXiv doi: 10.48550/arXiv.2404.14459 – start-page: 8384 volume-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) year: 2020 ident: B28 article-title: “Authorship attribution for neural text generation,” doi: 10.18653/v1/2020.emnlp-main.673 – start-page: 5673 volume-title: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining year: 2023 ident: B32 article-title: “CodeGeeX: a pre-trained model for code generation with multilingual evaluations on humaneval-x,” doi: 10.1145/3580305.3599790 – start-page: 24950 volume-title: International Conference on Machine Learning year: 2023 ident: B18 article-title: “Detectgpt: zero-shot machine-generated text detection using probability curvature,” – volume: 15 start-page: 100526 year: 2024 ident: B4 article-title: Programming with chatgpt: how far can we go? publication-title: Mach. Learn. Appl doi: 10.1016/j.mlwa.2024.100526 – start-page: 259 volume-title: International Conference on Artificial Intelligence in Education year: 2024 ident: B9 article-title: “Coding with AI: how are tools like chatgpt being used by students in foundational programming courses,” – year: 2023 ident: B21 article-title: Detecting LLM-generated text in computing education: a comparative study for chatgpt cases publication-title: arXiv doi: 10.48550/arXiv.2307.07411 – volume: 8 start-page: 1016 year: 2002 ident: B23 article-title: Finding plagiarisms among a set of programs with JPLAG publication-title: J. Univers. Comput. Sci doi: 10.3217/jucs-008-11-1016 – start-page: 111 volume-title: Proceedings of the 16th ACM SIGPLAN International Conference on Software Language Engineering year: 2023 ident: B24 article-title: “GPT-3-powered type error debugging: investigating the use of large language models for code repair,” doi: 10.1145/3623476.3623522 – volume: 38 start-page: 1046 year: 2022 ident: B17 article-title: Dolos: language-agnostic plagiarism detection in source code publication-title: J. Comput. Assist. Learn doi: 10.1111/jcal.12662 – year: 2022 ident: B20 article-title: Codegen: an open large language model for code with multi-turn program synthesis publication-title: arXiv doi: 10.48550/arXiv.2203.13474 – year: 2024 ident: B8 article-title: Evaluation of LLM tools for feedback generation in a course on concurrent programming publication-title: Int. J. Arti. Intell. Educ doi: 10.1007/s40593-024-00406-0 – volume: 2023 start-page: 9 year: 2023 ident: B2 article-title: Role of chatgpt in computer programming publication-title: Mesopotam. J. Comput. Sci doi: 10.58496/MJCSC/2023/002 – year: 2019 ident: B31 article-title: Defending against neural fake news publication-title: Adv. Neural Inf. Process. Syst – volume: 24 start-page: 1 year: 2023 ident: B5 article-title: Palm: scaling language modeling with pathways publication-title: J. Mach. Learn. Res doi: 10.5555/3648699.3648939 – start-page: 27 volume-title: Proceedings of the 2022 ACM Conference on International Computing Education Research, Vol. 1 year: 2022 ident: B25 article-title: “Automatic generation of programming exercises and code explanations using large language models,” – year: 2022 ident: B7 article-title: Robosourcing educational resources-leveraging large language models for learnersourcing publication-title: arXiv doi: 10.48550/arXiv.2211.04715 – volume: 33 start-page: 1877 year: 2020 ident: B3 article-title: Language models are few-shot learners publication-title: Adv. Neural Inf. Process Syst doi: 10.5555/3495724.3495883 – volume: 26 start-page: 101755 year: 2024 ident: B16 article-title: Discovering and exploring cases of educational source code plagiarism with dolos publication-title: SoftwareX doi: 10.1016/j.softx.2024.101755 – start-page: 526 volume-title: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 year: 2024 ident: B10 article-title: “Detecting chatgpt-generated code submissions in a cs1 course using machine learning models,” doi: 10.1145/3626252.3630826 – volume: 214 start-page: 112059 year: 2024 ident: B19 article-title: Gptsniffer: a codebert-based classifier to detect source code written by chatgpt publication-title: J. Syst. Softw doi: 10.1016/j.jss.2024.112059 – volume-title: Programming Homework Dataset for Plagiarism Detection year: 2020 ident: B15 – start-page: 1318 volume-title: Proceedings of 2012 2nd International Conference on Computer Science and Network Technology year: 2012 ident: B13 article-title: “Detection of plagiarism in students' programs using a data mining algorithm,” doi: 10.1109/ICCSNT.2012.6526164 – year: 2019 ident: B26 article-title: Release strategies and the social impacts of language models publication-title: arXiv doi: 10.48550/arXiv.1908.09203 – volume: 38 start-page: 23155 year: 2024 ident: B29 article-title: Detecting AI-generated code assignments using perplexity of large language models publication-title: Proc. AAAI Conf. Artif. Intell doi: 10.1609/aaai.v38i21.30361
SSID	ssj0002511587
Score	2.29227
Snippet	IntroductionGenerative AI tools can produce programming code that looks very similar to human-written code, which creates challenges in programming education....
SourceID	doaj crossref
SourceType	Open Website Index Database
SubjectTerms	detecting AI-generated code generative AI tools large language models (LLMs) programming code plagiarism detection programming code similarity
Title	Using pseudo-AI submissions for detecting AI-generated code
URI	https://doaj.org/article/5eca89389855411a842764250235718c
Volume	7
WOSCitedRecordID	wos001503836000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2624-9898 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002511587 issn: 2624-9898 databaseCode: DOA dateStart: 20190101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2624-9898 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002511587 issn: 2624-9898 databaseCode: M~E dateStart: 20190101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre
link	http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELVQxcDCN6J8yQMbslrHduKIqaBWdKBiAKlb5PgDdUmrpmXsb-fOSVGZWFgyRJZlPTt-7-LzO0Lued9waU3KhNMpk2WKSQBWMyOcdTpwiJvLWGwim0z0dJq_7ZT6wpywxh64Aa6nvDXAqTrHfCrOjZZJBppZRZ8Wri3uvqB6doIp3INROCudNbdkIArLewFTtCEeTBT-TAES5r-YaMewPzLL6JgctpKQDpqhnJA9X52So225Bdp-fWfkMR7v00Xt127OBmNaA5fNMIu1qiloT-o8nghgm8GYfUY_adCTFG-tn5OP0fD9-YW1tQ-YBcG1YsbmMvjgrDQiMcak0mvlTAga-F5rb5U2WRC54CWXzgIBaRuc9CZkIHD6XlyQTjWv_CWhSnmXWOWhP9gWQ1IKTIQRgGuKbiyqSx62OBSLxuKigNAAUSsiagWiVrSodckTQvXTEu2p4wuYtKKdtOKvSbv6j06uyQEODI_yE3FDOqvl2t-Sffu1mtXLu7ge4Pm6GX4DamC3yg
linkProvider	Directory of Open Access Journals
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Using+pseudo-AI+submissions+for+detecting+AI-generated+code&rft.jtitle=Frontiers+in+computer+science+%28Lausanne%29&rft.au=Bashir%2C+Shariq&rft.date=2025-05-23&rft.issn=2624-9898&rft.eissn=2624-9898&rft.volume=7&rft_id=info:doi/10.3389%2Ffcomp.2025.1549761&rft.externalDBID=n%2Fa&rft.externalDocID=10_3389_fcomp_2025_1549761
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2624-9898&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2624-9898&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2624-9898&client=summon