Code Redteaming: Probing Ethical Sensitivity of LLMs Through Natural Language Embedded in Code.
Uloženo v:
| Název: | Code Redteaming: Probing Ethical Sensitivity of LLMs Through Natural Language Embedded in Code. |
|---|---|
| Autoři: | Park, Chanjun1 (AUTHOR), Yoon, Jeongho2 (AUTHOR), Lim, Heuiseok2,3 (AUTHOR) limhseok@korea.ac.kr |
| Zdroj: | Mathematics (2227-7390). Jan2026, Vol. 14 Issue 1, p189. 18p. |
| Témata: | *LANGUAGE models, *ETHICS, *NATURAL language processing, *RISK assessment, *MORAL attitudes |
| Abstrakt: | Large language models are increasingly used in code generation and developer tools, yet their robustness to ethically problematic natural language embedded in source code is underexplored. In this work, we study content-safety vulnerabilities arising from ethically inappropriate language placed in non-functional code regions (e.g., comments or identifiers), rather than traditional functional security vulnerabilities such as exploitable program logic. In real-world and educational settings, programmers may include inappropriate expressions in identifiers, comments, or print statements that are operationally inert but ethically concerning. We present Code Redteaming, an adversarial evaluation framework that probes models' sensitivity to such linguistic content. Our benchmark spans Python and C and applies sentence-level and token-level perturbations across natural-language-bearing surfaces, evaluating 18 models from 1B to 70B parameters. Experiments reveal inconsistent scaling trends and substantial variance across injection types and surfaces, highlighting blind spots in current safety filters. These findings motivate input-sensitive safety evaluations and stronger defenses for code-focused LLM applications. [ABSTRACT FROM AUTHOR] |
| Databáze: | Academic Search Index |
Buďte první, kdo okomentuje tento záznam!
Full Text Finder
Nájsť tento článok vo Web of Science