Code Redteaming: Probing Ethical Sensitivity of LLMs Through Natural Language Embedded in Code

Gespeichert in:
Bibliographische Detailangaben
Titel: Code Redteaming: Probing Ethical Sensitivity of LLMs Through Natural Language Embedded in Code
Autoren: Chanjun Park, Jeongho Yoon, Heuiseok Lim
Quelle: Mathematics, Vol 14, Iss 1, p 189 (2026)
Verlagsinformationen: MDPI AG
Publikationsjahr: 2026
Bestand: Directory of Open Access Journals: DOAJ Articles
Schlagwörter: adversarial evaluation, content safety, ethical language detection, code analysis, large language models, Mathematics, QA1-939
Beschreibung: Large language models are increasingly used in code generation and developer tools, yet their robustness to ethically problematic natural language embedded in source code is underexplored. In this work, we study content-safety vulnerabilities arising from ethically inappropriate language placed in non-functional code regions (e.g., comments or identifiers), rather than traditional functional security vulnerabilities such as exploitable program logic. In real-world and educational settings, programmers may include inappropriate expressions in identifiers, comments, or print statements that are operationally inert but ethically concerning. We present Code Redteaming , an adversarial evaluation framework that probes models’ sensitivity to such linguistic content. Our benchmark spans Python and C and applies sentence-level and token-level perturbations across natural-language-bearing surfaces, evaluating 18 models from 1B to 70B parameters. Experiments reveal inconsistent scaling trends and substantial variance across injection types and surfaces, highlighting blind spots in current safety filters. These findings motivate input-sensitive safety evaluations and stronger defenses for code-focused LLM applications.
Publikationsart: article in journal/newspaper
Sprache: English
Relation: https://www.mdpi.com/2227-7390/14/1/189; https://doaj.org/toc/2227-7390; https://doaj.org/article/3f5930e14782473da84170f104d29e88
DOI: 10.3390/math14010189
Verfügbarkeit: https://doi.org/10.3390/math14010189
https://doaj.org/article/3f5930e14782473da84170f104d29e88
Dokumentencode: edsbas.CB80868C
Datenbank: BASE