GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models

Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal toke...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE/ACM International Conference on Automated Software Engineering : [proceedings] s. 643 - 655
Hlavní autoři: Zhang, Zhibo, Bai, Wuxia, Li, Yuxi, Meng, Mark Huasong, Wang, Kailong, Shi, Ling, Li, Li, Wang, Jun, Wang, Haoyu
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: ACM 27.10.2024
Témata:
ISSN:2643-1572
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs.In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research to-ward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.CCS CONCEPTS* Computing methodologies → Knowledge representation and reasoning.
AbstractList Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal mechanisms has brought many concerns about their trustworthiness and interpretability. Recent research has discovered a class of abnormal tokens in the model's vocabulary space and named them "glitch tokens". Those tokens, once included in the input, may induce the model to produce incorrect, irrelevant, or even harmful results, drastically undermining the reliability and practicality of LLMs.In this work, we aim to enhance the understanding of glitch tokens and propose techniques for their detection and mitigation. We first reveal the characteristic features induced by glitch tokens on LLMs, which are evidenced by significant deviations in the distributions of attention patterns and dynamic information from intermediate model layers. Based on the insights, we develop GlitchProber, a tool for efficient glitch token detection and mitigation. GlitchProber utilizes small-scale sampling, principal component analysis for accelerated feature extraction, and a simple classifier for efficient vocabulary screening. Taking one step further, GlitchProber rectifies abnormal model intermediate layer values to mitigate the destructive effects of glitch tokens. Evaluated on five mainstream open-source LLMs, GlitchProber demonstrates higher efficiency, precision, and recall compared to existing approaches, with an average F1 score of 0.86 and an average repair rate of 50.06%. GlitchProber unveils a novel path to address the challenges posed by glitch tokens and inspires future research to-ward more robust and interpretable LLMs. Our code is available at https://github.com/LLM-Integrity-Guard/GlitchProber.CCS CONCEPTS* Computing methodologies → Knowledge representation and reasoning.
Author Bai, Wuxia
Meng, Mark Huasong
Li, Li
Li, Yuxi
Wang, Haoyu
Shi, Ling
Zhang, Zhibo
Wang, Kailong
Wang, Jun
Author_xml – sequence: 1
  givenname: Zhibo
  surname: Zhang
  fullname: Zhang, Zhibo
  email: zhangzhibom@hust.edu.cn
  organization: Huazhong University of Science and Technology,Wuhan,China
– sequence: 2
  givenname: Wuxia
  surname: Bai
  fullname: Bai, Wuxia
  email: wuxiabai@hust.edu.cn
  organization: Huazhong University of Science and Technology,Wuhan,China
– sequence: 3
  givenname: Yuxi
  surname: Li
  fullname: Li, Yuxi
  email: yuxili@hust.edu.cn
  organization: Huazhong University of Science and Technology,Wuhan,China
– sequence: 4
  givenname: Mark Huasong
  surname: Meng
  fullname: Meng, Mark Huasong
  email: huasong.meng@u.nus.edu
  organization: Technical University of Munich,Munich,Germany
– sequence: 5
  givenname: Kailong
  surname: Wang
  fullname: Wang, Kailong
  email: wangkl@hust.edu.cn
  organization: Huazhong University of Science and Technology,Wuhan,China
– sequence: 6
  givenname: Ling
  surname: Shi
  fullname: Shi, Ling
  email: ling.shi@ntu.edu.sg
  organization: Nanyang Technological University,Singapore,Singapore
– sequence: 7
  givenname: Li
  surname: Li
  fullname: Li, Li
  email: lilicoding@ieee.org
  organization: Beihang University,Beijing,China
– sequence: 8
  givenname: Jun
  surname: Wang
  fullname: Wang, Jun
  email: junwang.lu@gmail.com
  organization: Beihang University,Beijing,China
– sequence: 9
  givenname: Haoyu
  surname: Wang
  fullname: Wang, Haoyu
  email: haoyuwang@hust.edu.cn
  organization: Huazhong University of Science and Technology,Wuhan,China
BookMark eNotjj1PwzAYhA0CiVI6szD4D6T49WfMVpVSkFLBUObKsV8Hi-KgJFTi3xMoy91zw53ukpzlNiMh18DmAFLdCm1BczYfXTHNTsjMGltKxgxwWZpTMuFaigKU4Rdk1vepZiMqDaAnJKz3afBvL11bY3dHF-Hgsk-5oasY0Q_pgPQeh19qM3U50E0aUuP-YhvpsU237TvmnqZMK9c1OGpuvtwImzbgvr8i59Hte5z9-5S8Pqy2y8eiel4_LRdV4cajQ1FjWTvhOVoVbJDKGmGd8rU0RghAxYPxUdbW2QCaycCNVAK854zF0kQQU3Jz3E2IuPvs0ofrvnfAjFZMafEDDlVX4A
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3691620.3695060
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798400712487
EISSN 2643-1572
EndPage 655
ExternalDocumentID 10765056
Genre orig-research
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IM
6IN
6J9
AAJGR
AAWTH
ABLEC
ACREN
ADYOE
ADZIZ
AFYQB
ALMA_UNASSIGNED_HOLDINGS
AMTXH
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-a248t-be8ba3c2e95d9d459739a5cb477331e52d7cf4b9a9d1604d274531cc200f87f13
IEDL.DBID RIE
ISICitedReferencesCount 1
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001353105400052&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Jan 15 06:20:43 EST 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a248t-be8ba3c2e95d9d459739a5cb477331e52d7cf4b9a9d1604d274531cc200f87f13
PageCount 13
ParticipantIDs ieee_primary_10765056
PublicationCentury 2000
PublicationDate 2024-Oct.-27
PublicationDateYYYYMMDD 2024-10-27
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-Oct.-27
  day: 27
PublicationDecade 2020
PublicationTitle IEEE/ACM International Conference on Automated Software Engineering : [proceedings]
PublicationTitleAbbrev ASE
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssib057256116
ssj0051577
Score 2.3120084
Snippet Large language models (LLMs) have achieved unprecedented success in the field of natural language processing. However, the black-box nature of their internal...
SourceID ieee
SourceType Publisher
StartPage 643
SubjectTerms Feature extraction
Glitch token
Large language models
LLM analysis
LLM security
Maintenance engineering
Prevention and mitigation
Principal component analysis
Reliability
Software engineering
Support vector machines
Systematics
Vocabulary
Title GlitchProber: Advancing Effective Detection and Mitigation of Glitch Tokens in Large Language Models
URI https://ieeexplore.ieee.org/document/10765056
WOSCitedRecordID wos001353105400052&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZoxcBUHkVQHvLAGmgcxw9WoDCUqkORulWOfREVKEFtyu_nbKcgBgaWKIlkKTrn_N3Z931HyJXwouIMdFJAxhLOpEjMUOqEOUSfEpwGE4jCYzmZqPlcT1uyeuDCAEAoPoNrfxvO8l1tN36rDD1cCo_YHdKRUkSy1vbnySWCd-pjnbgMI05L2Wr5pDy_yQQGQgxzVKG9pt6vZioBS0a9f37FPun_sPLo9BtvDsgOVIekt23LQFsvPSLuEUNr-zr1RJ_VLQ19ky2OoFGpGJc3eg9NKMGqqKkcfV5GoQ18rEsaR9NZ_YYZLl1WdOyLxfEaNzap7572vu6Tl9HD7O4paZspJIZx1eBcqMJkFqcld9pxzCMybXJbcOm7NkLOnLQlL7TRLhVD7jBbRfe0Fr2oVLJMs2PSreoKTghNYSgADaBKmXGnrFFK6KJwAg1gDC9OSd9bbfER9TIWW4MN_nh_RvYYhgoeEZg8J91mtYELsms_m-V6dRlm-QtCsagc
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELagIMFUHkW88cAaSBzHD1agFJFWHYrEVvkVUYES1Kb8fs5OCmJgYImSSJaic87fnX3fdwhdMi8qTpyMtEtJRAlnkYq5jIgF9CmclU4FonDORyPx8iLHLVk9cGGcc6H4zF3523CWbyuz9Ftl4OGcecReRxsZpSRu6Fqr3yfjAN-Jj3aahRiQmvNWzSeh2XXKIBQikKUy6VX1frVTCWjS7_7zO3ZQ74eXh8ffiLOL1ly5h7qrxgy49dN9ZB8guDavY0_1md_g0DnZwAjcaBXDAofvXB2KsEqsSouHs0ZqAx6rAjej8aR6gxwXz0qc-3JxuDZbm9j3T3tf9NBz_35yO4jadgqRIlTUMBtCq9TAxGRWWgqZRCpVZjTlvm-jy4jlpqBaKmkTFlML-So4qDHgR4XgRZIeoE5Zle4Q4cTFzIEBRMFTaoVRQjCptWVgAKWoPkI9b7XpR6OYMV0Z7PiP9xdoazAZ5tP8cfR0grYJBA4eHwg_RZ16vnRnaNN81rPF_DzM-BcvMqtj
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=GlitchProber%3A+Advancing+Effective+Detection+and+Mitigation+of+Glitch+Tokens+in+Large+Language+Models&rft.au=Zhang%2C+Zhibo&rft.au=Bai%2C+Wuxia&rft.au=Li%2C+Yuxi&rft.au=Meng%2C+Mark+Huasong&rft.date=2024-10-27&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=643&rft.epage=655&rft_id=info:doi/10.1145%2F3691620.3695060&rft.externalDocID=10765056