Using pseudo-AI submissions for detecting AI-generated code
IntroductionGenerative AI tools can produce programming code that looks very similar to human-written code, which creates challenges in programming education. Students may use these tools inappropriately for their programming assignments, and there currently are not reliable methods to detect AI-gen...
Uloženo v:
| Vydáno v: | Frontiers in computer science (Lausanne) Ročník 7 |
|---|---|
| Hlavní autor: | |
| Médium: | Journal Article |
| Jazyk: | angličtina |
| Vydáno: |
Frontiers Media S.A
23.05.2025
|
| Témata: | |
| ISSN: | 2624-9898, 2624-9898 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | IntroductionGenerative AI tools can produce programming code that looks very similar to human-written code, which creates challenges in programming education. Students may use these tools inappropriately for their programming assignments, and there currently are not reliable methods to detect AI-generated code. It is important to address this issue to protect academic integrity while allowing the constructive use of AI tools. Previous studies have explored ways to detect AI-generated text, such as analyzing structural differences, embedding watermarks, examining specific features, or using fine-tuned language models. However, certain techniques, like prompt engineering, can make AI-generated code harder to identify.MethodsTo tackle this problem, this article suggests a new approach for instructors to handle programming assignment integrity. The idea is for instructors to use generative AI tools themselves to create example AI-generated submissions (pseudo-AI submissions) for each task. These pseudo-AI submissions, shared along with the task instructions, act as reference solutions for students. In the presence of pseudo-AI submissions, students are made aware that submissions resembling these examples are easily identifiable and will likely be flagged for lack of originality. On one side, this transparency removes the perceived advantage of using generative AI tools to complete assignments, as their output would closely match the provided examples, making it obvious to instructors. On the other side, the presence of these pseudo-AI submissions reinforces the expectation for students to produce unique and personalized work, motivating them to engage more deeply with the material and rely on their own problem-solving skills.ResultsA user study indicates that this method can detect AI-generated code with over 96% accuracy.DiscussionThe analysis of results shows that pseudo-AI submissions created using AI tools do not closely resemble student-written code, suggesting that the framework does not hinder students from writing their own unique solutions. Differences in areas such as expression assignments, use of language features, readability, efficiency, conciseness, and clean coding practices further distinguish pseudo-AI submissions from student work. |
|---|---|
| AbstractList | IntroductionGenerative AI tools can produce programming code that looks very similar to human-written code, which creates challenges in programming education. Students may use these tools inappropriately for their programming assignments, and there currently are not reliable methods to detect AI-generated code. It is important to address this issue to protect academic integrity while allowing the constructive use of AI tools. Previous studies have explored ways to detect AI-generated text, such as analyzing structural differences, embedding watermarks, examining specific features, or using fine-tuned language models. However, certain techniques, like prompt engineering, can make AI-generated code harder to identify.MethodsTo tackle this problem, this article suggests a new approach for instructors to handle programming assignment integrity. The idea is for instructors to use generative AI tools themselves to create example AI-generated submissions (pseudo-AI submissions) for each task. These pseudo-AI submissions, shared along with the task instructions, act as reference solutions for students. In the presence of pseudo-AI submissions, students are made aware that submissions resembling these examples are easily identifiable and will likely be flagged for lack of originality. On one side, this transparency removes the perceived advantage of using generative AI tools to complete assignments, as their output would closely match the provided examples, making it obvious to instructors. On the other side, the presence of these pseudo-AI submissions reinforces the expectation for students to produce unique and personalized work, motivating them to engage more deeply with the material and rely on their own problem-solving skills.ResultsA user study indicates that this method can detect AI-generated code with over 96% accuracy.DiscussionThe analysis of results shows that pseudo-AI submissions created using AI tools do not closely resemble student-written code, suggesting that the framework does not hinder students from writing their own unique solutions. Differences in areas such as expression assignments, use of language features, readability, efficiency, conciseness, and clean coding practices further distinguish pseudo-AI submissions from student work. |
| Author | Bashir, Shariq |
| Author_xml | – sequence: 1 givenname: Shariq surname: Bashir fullname: Bashir, Shariq |
| BookMark | eNpNkMtqwzAUREVJoWmaH-jKP-BUVw9bpqsQ-jAEumnW4ka6Cg6JFSRn0b9vXpSuZhiGsziPbNTHnhh7Bj6T0jQvwcX9YSa40DPQqqkruGNjUQlVNqYxo3_9gU1z3nJ-ugJoU4_Z6yp3_aY4ZDr6WM7bIh_X-y7nLva5CDEVngZyw_kzb8sN9ZRwIF-46OmJ3QfcZZrecsJW72_fi89y-fXRLubL0kngQ4muUYGCdwqlQMRKkdEeQzBQa2PIaYN1kI2ENSjvBAjjgleEoVYKOMkJa69cH3FrD6nbY_qxETt7GWLaWExD53ZkNTk0zUmK0VoBoFGirpTQXEhdg3EnlriyXIo5Jwp_POD2bNNebNqzTXuzKX8BbVdp1g |
| Cites_doi | 10.1145/3545945.3569759 10.1609/aaai.v38i21.30527 10.1016/j.tsc.2024.101522 10.1145/3624720 10.1109/SP46214.2022.9833571 10.48550/arXiv.2404.14459 10.18653/v1/2020.emnlp-main.673 10.1145/3580305.3599790 10.1016/j.mlwa.2024.100526 10.48550/arXiv.2307.07411 10.3217/jucs-008-11-1016 10.1145/3623476.3623522 10.1111/jcal.12662 10.48550/arXiv.2203.13474 10.1007/s40593-024-00406-0 10.58496/MJCSC/2023/002 10.5555/3648699.3648939 10.48550/arXiv.2211.04715 10.5555/3495724.3495883 10.1016/j.softx.2024.101755 10.1145/3626252.3630826 10.1016/j.jss.2024.112059 10.1109/ICCSNT.2012.6526164 10.48550/arXiv.1908.09203 10.1609/aaai.v38i21.30361 |
| ContentType | Journal Article |
| DBID | AAYXX CITATION DOA |
| DOI | 10.3389/fcomp.2025.1549761 |
| DatabaseName | CrossRef DOAJ Directory of Open Access Journals |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 2624-9898 |
| ExternalDocumentID | oai_doaj_org_article_5eca89389855411a842764250235718c 10_3389_fcomp_2025_1549761 |
| GroupedDBID | 9T4 AAFWJ AAYXX ADMLS AFPKN ALMA_UNASSIGNED_HOLDINGS CITATION GROUPED_DOAJ M~E OK1 |
| ID | FETCH-LOGICAL-c310t-ac94fefdc4a32aaa64e85daff817588ec58a7f3931b14dc2128cfd4eaf74410e3 |
| IEDL.DBID | DOA |
| ISICitedReferencesCount | 0 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001503836000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 2624-9898 |
| IngestDate | Fri Oct 03 12:53:24 EDT 2025 Sat Nov 29 07:53:44 EST 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c310t-ac94fefdc4a32aaa64e85daff817588ec58a7f3931b14dc2128cfd4eaf74410e3 |
| OpenAccessLink | https://doaj.org/article/5eca89389855411a842764250235718c |
| ParticipantIDs | doaj_primary_oai_doaj_org_article_5eca89389855411a842764250235718c crossref_primary_10_3389_fcomp_2025_1549761 |
| PublicationCentury | 2000 |
| PublicationDate | 2025-05-23 |
| PublicationDateYYYYMMDD | 2025-05-23 |
| PublicationDate_xml | – month: 05 year: 2025 text: 2025-05-23 day: 23 |
| PublicationDecade | 2020 |
| PublicationTitle | Frontiers in computer science (Lausanne) |
| PublicationYear | 2025 |
| Publisher | Frontiers Media S.A |
| Publisher_xml | – name: Frontiers Media S.A |
| References | Mitchell (B18) 2023 Maertens (B17) 2022; 38 Orenstrakh (B21) 2023 Bucaioni (B4) 2024; 15 Kechao (B13) 2012 Sarsa (B25) 2022 Ljubovic (B15) 2020 Ghimire (B9) 2024 Kirchenbauer (B14) 2023 Jukiewicz (B12) 2024; 52 Xu (B29) 2024; 38 Chowdhery (B5) 2023; 24 Estévez-Ayres (B8) 2024 Nijkamp (B20) 2022 Zheng (B32) 2023 Biswas (B2) 2023; 2023 Solaiman (B26) 2019 Prechelt (B23) 2002; 8 Hoq (B10) 2024 Uchendu (B28) 2020 Huang (B11) 2019 Pearce (B22) 2022 Tóth (B27) 2024 Xu (B30) 2024; 38 Maertens (B16) 2024; 26 Ribeiro (B24) 2023 Nguyen (B19) 2024; 214 Becker (B1) 2023 Zellers (B31) 2019 Denny (B6) 2024; 67 Denny (B7) 2022 Brown (B3) 2020; 33 |
| References_xml | – start-page: 17061 volume-title: International Conference on Machine Learning year: 2023 ident: B14 article-title: “A watermark for large language models,” – start-page: 500 volume-title: Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1 year: 2023 ident: B1 article-title: “Programming is hard-or at least it used to be: educational opportunities and challenges of AI code generation,” doi: 10.1145/3545945.3569759 – start-page: 269 volume-title: 2019 International Conference on Computer, Network, Communication and Information Systems (CNCI 2019) year: 2019 ident: B11 article-title: “An approach of suspected code plagiarism detection based on xgboost incremental learning,” – volume: 38 start-page: 23688 year: 2024 ident: B30 article-title: Chatgpt-generated code assignment detection using perplexity of large language models (student abstract) publication-title: Proc. AAAI Conf. Artif. Intell doi: 10.1609/aaai.v38i21.30527 – volume: 52 start-page: 101522 year: 2024 ident: B12 article-title: The future of grading programming assignments in education: the role of chatgpt in automating the assessment and feedback process publication-title: Think. Skills Creat doi: 10.1016/j.tsc.2024.101522 – volume: 67 start-page: 56 year: 2024 ident: B6 article-title: Computing education in the era of generative AI publication-title: Commun. ACM doi: 10.1145/3624720 – start-page: 754 volume-title: 2022 IEEE Symposium on Security and Privacy (SP) year: 2022 ident: B22 article-title: “Asleep at the keyboard? Assessing the security of github copilot's code contributions,” doi: 10.1109/SP46214.2022.9833571 – year: 2024 ident: B27 article-title: LLMs in web-development: evaluating LLM-generated PHP code unveiling vulnerabilities and limitations publication-title: arXiv doi: 10.48550/arXiv.2404.14459 – start-page: 8384 volume-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) year: 2020 ident: B28 article-title: “Authorship attribution for neural text generation,” doi: 10.18653/v1/2020.emnlp-main.673 – start-page: 5673 volume-title: Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining year: 2023 ident: B32 article-title: “CodeGeeX: a pre-trained model for code generation with multilingual evaluations on humaneval-x,” doi: 10.1145/3580305.3599790 – start-page: 24950 volume-title: International Conference on Machine Learning year: 2023 ident: B18 article-title: “Detectgpt: zero-shot machine-generated text detection using probability curvature,” – volume: 15 start-page: 100526 year: 2024 ident: B4 article-title: Programming with chatgpt: how far can we go? publication-title: Mach. Learn. Appl doi: 10.1016/j.mlwa.2024.100526 – start-page: 259 volume-title: International Conference on Artificial Intelligence in Education year: 2024 ident: B9 article-title: “Coding with AI: how are tools like chatgpt being used by students in foundational programming courses,” – year: 2023 ident: B21 article-title: Detecting LLM-generated text in computing education: a comparative study for chatgpt cases publication-title: arXiv doi: 10.48550/arXiv.2307.07411 – volume: 8 start-page: 1016 year: 2002 ident: B23 article-title: Finding plagiarisms among a set of programs with JPLAG publication-title: J. Univers. Comput. Sci doi: 10.3217/jucs-008-11-1016 – start-page: 111 volume-title: Proceedings of the 16th ACM SIGPLAN International Conference on Software Language Engineering year: 2023 ident: B24 article-title: “GPT-3-powered type error debugging: investigating the use of large language models for code repair,” doi: 10.1145/3623476.3623522 – volume: 38 start-page: 1046 year: 2022 ident: B17 article-title: Dolos: language-agnostic plagiarism detection in source code publication-title: J. Comput. Assist. Learn doi: 10.1111/jcal.12662 – year: 2022 ident: B20 article-title: Codegen: an open large language model for code with multi-turn program synthesis publication-title: arXiv doi: 10.48550/arXiv.2203.13474 – year: 2024 ident: B8 article-title: Evaluation of LLM tools for feedback generation in a course on concurrent programming publication-title: Int. J. Arti. Intell. Educ doi: 10.1007/s40593-024-00406-0 – volume: 2023 start-page: 9 year: 2023 ident: B2 article-title: Role of chatgpt in computer programming publication-title: Mesopotam. J. Comput. Sci doi: 10.58496/MJCSC/2023/002 – year: 2019 ident: B31 article-title: Defending against neural fake news publication-title: Adv. Neural Inf. Process. Syst – volume: 24 start-page: 1 year: 2023 ident: B5 article-title: Palm: scaling language modeling with pathways publication-title: J. Mach. Learn. Res doi: 10.5555/3648699.3648939 – start-page: 27 volume-title: Proceedings of the 2022 ACM Conference on International Computing Education Research, Vol. 1 year: 2022 ident: B25 article-title: “Automatic generation of programming exercises and code explanations using large language models,” – year: 2022 ident: B7 article-title: Robosourcing educational resources-leveraging large language models for learnersourcing publication-title: arXiv doi: 10.48550/arXiv.2211.04715 – volume: 33 start-page: 1877 year: 2020 ident: B3 article-title: Language models are few-shot learners publication-title: Adv. Neural Inf. Process Syst doi: 10.5555/3495724.3495883 – volume: 26 start-page: 101755 year: 2024 ident: B16 article-title: Discovering and exploring cases of educational source code plagiarism with dolos publication-title: SoftwareX doi: 10.1016/j.softx.2024.101755 – start-page: 526 volume-title: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 year: 2024 ident: B10 article-title: “Detecting chatgpt-generated code submissions in a cs1 course using machine learning models,” doi: 10.1145/3626252.3630826 – volume: 214 start-page: 112059 year: 2024 ident: B19 article-title: Gptsniffer: a codebert-based classifier to detect source code written by chatgpt publication-title: J. Syst. Softw doi: 10.1016/j.jss.2024.112059 – volume-title: Programming Homework Dataset for Plagiarism Detection year: 2020 ident: B15 – start-page: 1318 volume-title: Proceedings of 2012 2nd International Conference on Computer Science and Network Technology year: 2012 ident: B13 article-title: “Detection of plagiarism in students' programs using a data mining algorithm,” doi: 10.1109/ICCSNT.2012.6526164 – year: 2019 ident: B26 article-title: Release strategies and the social impacts of language models publication-title: arXiv doi: 10.48550/arXiv.1908.09203 – volume: 38 start-page: 23155 year: 2024 ident: B29 article-title: Detecting AI-generated code assignments using perplexity of large language models publication-title: Proc. AAAI Conf. Artif. Intell doi: 10.1609/aaai.v38i21.30361 |
| SSID | ssj0002511587 |
| Score | 2.29227 |
| Snippet | IntroductionGenerative AI tools can produce programming code that looks very similar to human-written code, which creates challenges in programming education.... |
| SourceID | doaj crossref |
| SourceType | Open Website Index Database |
| SubjectTerms | detecting AI-generated code generative AI tools large language models (LLMs) programming code plagiarism detection programming code similarity |
| Title | Using pseudo-AI submissions for detecting AI-generated code |
| URI | https://doaj.org/article/5eca89389855411a842764250235718c |
| Volume | 7 |
| WOSCitedRecordID | wos001503836000001&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals customDbUrl: eissn: 2624-9898 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002511587 issn: 2624-9898 databaseCode: DOA dateStart: 20190101 isFulltext: true titleUrlDefault: https://www.doaj.org/ providerName: Directory of Open Access Journals – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2624-9898 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0002511587 issn: 2624-9898 databaseCode: M~E dateStart: 20190101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV09T8MwELVQxcDCN6J8yQMbslrHduKIqaBWdKBiAKlb5PgDdUmrpmXsb-fOSVGZWFgyRJZlPTt-7-LzO0Lued9waU3KhNMpk2WKSQBWMyOcdTpwiJvLWGwim0z0dJq_7ZT6wpywxh64Aa6nvDXAqTrHfCrOjZZJBppZRZ8Wri3uvqB6doIp3INROCudNbdkIArLewFTtCEeTBT-TAES5r-YaMewPzLL6JgctpKQDpqhnJA9X52So225Bdp-fWfkMR7v00Xt127OBmNaA5fNMIu1qiloT-o8nghgm8GYfUY_adCTFG-tn5OP0fD9-YW1tQ-YBcG1YsbmMvjgrDQiMcak0mvlTAga-F5rb5U2WRC54CWXzgIBaRuc9CZkIHD6XlyQTjWv_CWhSnmXWOWhP9gWQ1IKTIQRgGuKbiyqSx62OBSLxuKigNAAUSsiagWiVrSodckTQvXTEu2p4wuYtKKdtOKvSbv6j06uyQEODI_yE3FDOqvl2t-Sffu1mtXLu7ge4Pm6GX4DamC3yg |
| linkProvider | Directory of Open Access Journals |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Using+pseudo-AI+submissions+for+detecting+AI-generated+code&rft.jtitle=Frontiers+in+computer+science+%28Lausanne%29&rft.au=Bashir%2C+Shariq&rft.date=2025-05-23&rft.issn=2624-9898&rft.eissn=2624-9898&rft.volume=7&rft_id=info:doi/10.3389%2Ffcomp.2025.1549761&rft.externalDBID=n%2Fa&rft.externalDocID=10_3389_fcomp_2025_1549761 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2624-9898&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2624-9898&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2624-9898&client=summon |