Podrobná bibliografie
| Název: |
The Unprecedented Surge in Generative AI: Empirical Analysis of Trusted and Malicious Large Language Models (LLMs). |
| Autoři: |
Sood, Aditya K., Zeadally, Sherali |
| Zdroj: |
IEEE Technology & Society Magazine; Sep2025, Vol. 44 Issue 3, p98-108, 11p |
| Témata: |
GENERATIVE artificial intelligence, LANGUAGE models, EMPIRICAL research, ETHICS |
| Abstrakt: |
Trusted large language models (LLMs) inherit ethical guidelines to prevent generating harmful content, whereas malicious LLMs are engineered to enable the generation of unethical and toxic responses. Both trusted and malicious LLMs use guardrails in differential contexts per the requirements of the developers and attackers, respectively. We explore the multifaceted world of guardrails implementation in LLMs by conducting an empirical analysis to assess the effectiveness of guardrails using prompts. Our results revealed that guardrails deployed in the trusted LLMs could be bypassed using prompt manipulation techniques such as “pretend” and “persist” to generate harmful content. In addition, we also discovered that malicious LLMs still deploy weak guardrails to evade detection by generating human-like content. This empirical analysis provides insights into the design of the malicious and trusted LLMs. We also propose recommendations to defend against prompt manipulation and guardrails bypass while designing LLMs. [ABSTRACT FROM AUTHOR] |
|
Copyright of IEEE Technology & Society Magazine is the property of IEEE and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) |
| Databáze: |
Biomedical Index |