GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models
Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal toke...
Uloženo v:
| Vydáno v: | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] s. 643 - 655 |
|---|---|
| Hlavní autoři: | , , , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
ACM
27.10.2024
|
| Témata: | |
| ISSN: | 2643-1572 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs.In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research to-ward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.CCS CONCEPTS* Computing methodologies → Knowledge representation and reasoning. |
|---|---|
| AbstractList | Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs.In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research to-ward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.CCS CONCEPTS* Computing methodologies → Knowledge representation and reasoning. |
| Author | Bai, Wuxia Meng, Mark Huasong Li, Li Li, Yuxi Wang, Haoyu Shi, Ling Zhang, Zhibo Wang, Kailong Wang, Jun |
| Author_xml | – sequence: 1 givenname: Zhibo surname: Zhang fullname: Zhang, Zhibo email: zhangzhibom@hust.edu.cn organization: Huazhong University of Science and Technology,Wuhan,China – sequence: 2 givenname: Wuxia surname: Bai fullname: Bai, Wuxia email: wuxiabai@hust.edu.cn organization: Huazhong University of Science and Technology,Wuhan,China – sequence: 3 givenname: Yuxi surname: Li fullname: Li, Yuxi email: yuxili@hust.edu.cn organization: Huazhong University of Science and Technology,Wuhan,China – sequence: 4 givenname: Mark Huasong surname: Meng fullname: Meng, Mark Huasong email: huasong.meng@u.nus.edu organization: Technical University of Munich,Munich,Germany – sequence: 5 givenname: Kailong surname: Wang fullname: Wang, Kailong email: wangkl@hust.edu.cn organization: Huazhong University of Science and Technology,Wuhan,China – sequence: 6 givenname: Ling surname: Shi fullname: Shi, Ling email: ling.shi@ntu.edu.sg organization: Nanyang Technological University,Singapore,Singapore – sequence: 7 givenname: Li surname: Li fullname: Li, Li email: lilicoding@ieee.org organization: Beihang University,Beijing,China – sequence: 8 givenname: Jun surname: Wang fullname: Wang, Jun email: junwang.lu@gmail.com organization: Beihang University,Beijing,China – sequence: 9 givenname: Haoyu surname: Wang fullname: Wang, Haoyu email: haoyuwang@hust.edu.cn organization: Huazhong University of Science and Technology,Wuhan,China |
| BookMark | eNotjj1PwzAYhA0CiVI6szD4D6T49WfMVpVSkFLBUObKsV8Hi-KgJFTi3xMoy91zw53ukpzlNiMh18DmAFLdCm1BczYfXTHNTsjMGltKxgxwWZpTMuFaigKU4Rdk1vepZiMqDaAnJKz3afBvL11bY3dHF-Hgsk-5oasY0Q_pgPQeh19qM3U50E0aUuP-YhvpsU237TvmnqZMK9c1OGpuvtwImzbgvr8i59Hte5z9-5S8Pqy2y8eiel4_LRdV4cajQ1FjWTvhOVoVbJDKGmGd8rU0RghAxYPxUdbW2QCaycCNVAK854zF0kQQU3Jz3E2IuPvs0ofrvnfAjFZMafEDDlVX4A |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3691620.3695060 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798400712487 |
| EISSN | 2643-1572 |
| EndPage | 655 |
| ExternalDocumentID | 10765056 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IM 6IN 6J9 AAJGR AAWTH ABLEC ACREN ADYOE ADZIZ AFYQB ALMA_UNASSIGNED_HOLDINGS AMTXH BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
| ID | FETCH-LOGICAL-a248t-be8ba3c2e95d9d459739a5cb477331e52d7cf4b9a9d1604d274531cc200f87f13 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 1 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001353105400052&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Jan 15 06:20:43 EST 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a248t-be8ba3c2e95d9d459739a5cb477331e52d7cf4b9a9d1604d274531cc200f87f13 |
| PageCount | 13 |
| ParticipantIDs | ieee_primary_10765056 |
| PublicationCentury | 2000 |
| PublicationDate | 2024-Oct.-27 |
| PublicationDateYYYYMMDD | 2024-10-27 |
| PublicationDate_xml | – month: 10 year: 2024 text: 2024-Oct.-27 day: 27 |
| PublicationDecade | 2020 |
| PublicationTitle | IEEE/ACM International Conference on Automated Software Engineering : [proceedings] |
| PublicationTitleAbbrev | ASE |
| PublicationYear | 2024 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssib057256116 ssj0051577 |
| Score | 2.3120084 |
| Snippet | Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 643 |
| SubjectTerms | Feature extraction Glitch token Large language models LLM analysis LLM security Maintenance engineering Prevention and mitigation Principal component analysis Reliability Software engineering Support vector machines Systematics Vocabulary |
| Title | GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models |
| URI | https://ieeexplore.ieee.org/document/10765056 |
| WOSCitedRecordID | wos001353105400052&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxcBUHkVQHvLAGmgcxw9WoDCUqkORulWOfREVKEFtyu_nbKcgBgaWKIlkKTrn_N3Z931HyJXwouIMdFJAxhLOpEjMUOqEOUSfEpwGE4jCYzmZqPlcT1uyeuDCAEAoPoNrfxvO8l1tN36rDD1cCo_YHdKRUkSy1vbnySWCd-pjnbgMI05L2Wr5pDy_yQQGQgxzVKG9pt6vZioBS0a9f37FPun_sPLo9BtvDsgOVIekt23LQFsvPSLuEUNr-zr1RJ_VLQ19ky2OoFGpGJc3eg9NKMGqqKkcfV5GoQ18rEsaR9NZ_YYZLl1WdOyLxfEaNzap7572vu6Tl9HD7O4paZspJIZx1eBcqMJkFqcld9pxzCMybXJbcOm7NkLOnLQlL7TRLhVD7jBbRfe0Fr2oVLJMs2PSreoKTghNYSgADaBKmXGnrFFK6KJwAg1gDC9OSd9bbfER9TIWW4MN_nh_RvYYhgoeEZg8J91mtYELsms_m-V6dRlm-QtCsagc |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagIMFUHkW88cAaSBzHD1agFJFWHYrEVvkVUYES1Kb8fs5OCmJgYImSSJaic87fnX3fdwhdMi8qTpyMtEtJRAlnkYq5jIgF9CmclU4FonDORyPx8iLHLVk9cGGcc6H4zF3523CWbyuz9Ftl4OGcecReRxsZpSRu6Fqr3yfjAN-Jj3aahRiQmvNWzSeh2XXKIBQikKUy6VX1frVTCWjS7_7zO3ZQ74eXh8ffiLOL1ly5h7qrxgy49dN9ZB8guDavY0_1md_g0DnZwAjcaBXDAofvXB2KsEqsSouHs0ZqAx6rAjej8aR6gxwXz0qc-3JxuDZbm9j3T3tf9NBz_35yO4jadgqRIlTUMBtCq9TAxGRWWgqZRCpVZjTlvm-jy4jlpqBaKmkTFlML-So4qDHgR4XgRZIeoE5Zle4Q4cTFzIEBRMFTaoVRQjCptWVgAKWoPkI9b7XpR6OYMV0Z7PiP9xdoazAZ5tP8cfR0grYJBA4eHwg_RZ16vnRnaNN81rPF_DzM-BcvMqtj |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=GlitchProber%3A+Advancing+Effective+Detection+and+Mitigation+of+Glitch+Tokens+in+Large+Language+Models&rft.au=Zhang%2C+Zhibo&rft.au=Bai%2C+Wuxia&rft.au=Li%2C+Yuxi&rft.au=Meng%2C+Mark+Huasong&rft.date=2024-10-27&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=643&rft.epage=655&rft_id=info:doi/10.1145%2F3691620.3695060&rft.externalDocID=10765056 |