Code Redteaming: Probing Ethical Sensitivity of LLMs Through Natural Language Embedded in Code
Gespeichert in:
| Titel: | Code Redteaming: Probing Ethical Sensitivity of LLMs Through Natural Language Embedded in Code |
|---|---|
| Autoren: | Chanjun Park, Jeongho Yoon, Heuiseok Lim |
| Quelle: | Mathematics, Vol 14, Iss 1, p 189 (2026) |
| Verlagsinformationen: | MDPI AG |
| Publikationsjahr: | 2026 |
| Bestand: | Directory of Open Access Journals: DOAJ Articles |
| Schlagwörter: | adversarial evaluation, content safety, ethical language detection, code analysis, large language models, Mathematics, QA1-939 |
| Beschreibung: | Large language models are increasingly used in code generation and developer tools, yet their robustness to ethically problematic natural language embedded in source code is underexplored. In this work, we study content-safety vulnerabilities arising from ethically inappropriate language placed in non-functional code regions (e.g., comments or identifiers), rather than traditional functional security vulnerabilities such as exploitable program logic. In real-world and educational settings, programmers may include inappropriate expressions in identifiers, comments, or print statements that are operationally inert but ethically concerning. We present Code Redteaming , an adversarial evaluation framework that probes models’ sensitivity to such linguistic content. Our benchmark spans Python and C and applies sentence-level and token-level perturbations across natural-language-bearing surfaces, evaluating 18 models from 1B to 70B parameters. Experiments reveal inconsistent scaling trends and substantial variance across injection types and surfaces, highlighting blind spots in current safety filters. These findings motivate input-sensitive safety evaluations and stronger defenses for code-focused LLM applications. |
| Publikationsart: | article in journal/newspaper |
| Sprache: | English |
| Relation: | https://www.mdpi.com/2227-7390/14/1/189; https://doaj.org/toc/2227-7390; https://doaj.org/article/3f5930e14782473da84170f104d29e88 |
| DOI: | 10.3390/math14010189 |
| Verfügbarkeit: | https://doi.org/10.3390/math14010189 https://doaj.org/article/3f5930e14782473da84170f104d29e88 |
| Dokumentencode: | edsbas.CB80868C |
| Datenbank: | BASE |
Schreiben Sie den ersten Kommentar!
Nájsť tento článok vo Web of Science