DeepLSH: Deep Locality-Sensitive Hash Learning for Fast and Efficient Near-Duplicate Crash Report Detection

Automatic crash bucketing is a crucial phase in the software de-velopment process for efficiently triaging bug reports. It generally consists in grouping similar reports through clustering techniques. However, with real-time streaming bug collection, systems are needed to quickly answer the question...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings / International Conference on Software Engineering S. 2445 - 2456
Hauptverfasser: Remil, Youcef, Bendimerad, Anes, Mathonat, Romain, Raissi, Chedy, Kaytoue, Mehdi
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: ACM 14.04.2024
Schlagworte:
ISSN:1558-1225
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Automatic crash bucketing is a crucial phase in the software de-velopment process for efficiently triaging bug reports. It generally consists in grouping similar reports through clustering techniques. However, with real-time streaming bug collection, systems are needed to quickly answer the question: What are the most similar bugs to a new one?, that is, efficiently find near-duplicates. It is thus natural to consider nearest neighbors search to tackle this problem and especially the well-known locality-sensitive hashing (LSH) to deal with large datasets due to its sublinear performance and theoretical guarantees on the similarity search accuracy. Surprisingly, LSH has not been considered in the crash bucketing literature. It is indeed not trivial to derive hash functions that satisfy the so-called locality-sensitive property for the most advanced crash bucketing metrics. Consequently, we study in this paper how to leverage LSH for this task. To be able to consider the most relevant metrics used in the literature, we introduce DeepLsh, a Siamese DNN architecture with an original loss function, that perfectly approximates the locality-sensitivity property even for Jaccard and Cosine metrics for which exact LSH solutions exist. We support this claim with a series of experiments on an original dataset, which we make available.
AbstractList Automatic crash bucketing is a crucial phase in the software de-velopment process for efficiently triaging bug reports. It generally consists in grouping similar reports through clustering techniques. However, with real-time streaming bug collection, systems are needed to quickly answer the question: What are the most similar bugs to a new one?, that is, efficiently find near-duplicates. It is thus natural to consider nearest neighbors search to tackle this problem and especially the well-known locality-sensitive hashing (LSH) to deal with large datasets due to its sublinear performance and theoretical guarantees on the similarity search accuracy. Surprisingly, LSH has not been considered in the crash bucketing literature. It is indeed not trivial to derive hash functions that satisfy the so-called locality-sensitive property for the most advanced crash bucketing metrics. Consequently, we study in this paper how to leverage LSH for this task. To be able to consider the most relevant metrics used in the literature, we introduce DeepLsh, a Siamese DNN architecture with an original loss function, that perfectly approximates the locality-sensitivity property even for Jaccard and Cosine metrics for which exact LSH solutions exist. We support this claim with a series of experiments on an original dataset, which we make available.
Author Raissi, Chedy
Mathonat, Romain
Bendimerad, Anes
Kaytoue, Mehdi
Remil, Youcef
Author_xml – sequence: 1
  givenname: Youcef
  surname: Remil
  fullname: Remil, Youcef
  email: yre@infologic.fr
  organization: INSA Lyon, Infologic R&D,Bourg-Lès-Valence,France,26500
– sequence: 2
  givenname: Anes
  surname: Bendimerad
  fullname: Bendimerad, Anes
  email: abe@infologic.fr
  organization: Infologic R&D,Bourg-Lès-Valence,France,26500
– sequence: 3
  givenname: Romain
  surname: Mathonat
  fullname: Mathonat, Romain
  email: rma@infologic.fr
  organization: Infologic R&D,Bourg-Lès-Valence,France,26500
– sequence: 4
  givenname: Chedy
  surname: Raissi
  fullname: Raissi, Chedy
  email: chedy.raissi@inria.fr
  organization: Riot Games,Singapore
– sequence: 5
  givenname: Mehdi
  surname: Kaytoue
  fullname: Kaytoue, Mehdi
  email: mka@infologic.fr
  organization: INSA Lyon, Infologic R&D,Bourg-Lès-Valence,France,26500
BookMark eNotj81KAzEYRaMoWGvXblzkBabmy88kcSf9scKgYHVdMpMvGqyZYSYKfXun6OpeuJwD95KcpTYhIdfA5gBS3QpltWJiLkphQZYnZGa1NZIxzThoeUomoJQpgHN1QWbDEGumpFC6lGJCPpeIXbXd3NFjoVXbuH3Mh2KLaYg5_iDduOGDVuj6FNM7DW1P127I1CVPVyHEJmLK9Gnci-V3t4-Ny0gX_RF6wa7t8yjO2OTYpityHtx-wNl_TsnbevW62BTV88Pj4r4qHFjIhTXWeOk5G89wxcGXzAfkjfJomQheOjEeqK0XoJ00tTS6dp6jBVcGMLWYkps_b0TEXdfHL9cfdjBCRvBS_AIiD1nV
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1145/3597503.3639146
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library Online
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798400702174
EISSN 1558-1225
EndPage 2456
ExternalDocumentID 10548326
Genre orig-research
GroupedDBID -~X
.4S
.DC
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
FEDTE
I-F
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-a191t-9898d4d201462521d60dfe2c5de903fd4a3054b9d317a48b487bad2e91a6f18b3
IEDL.DBID RIE
IngestDate Wed Aug 27 01:53:13 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a191t-9898d4d201462521d60dfe2c5de903fd4a3054b9d317a48b487bad2e91a6f18b3
PageCount 12
ParticipantIDs ieee_primary_10548326
PublicationCentury 2000
PublicationDate 2024-April-14
PublicationDateYYYYMMDD 2024-04-14
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-April-14
  day: 14
PublicationDecade 2020
PublicationTitle Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev ICSE
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssib054357643
ssib055306466
ssj0006499
Score 2.256624
Snippet Automatic crash bucketing is a crucial phase in the software de-velopment process for efficiently triaging bug reports. It generally consists in grouping...
SourceID ieee
SourceType Publisher
StartPage 2445
SubjectTerms Approximate nearest neighbors
Computer bugs
Crash deduplication
Hash functions
Locality-sensitive hashing
Measurement
Nearest neighbor methods
Neural networks
Search problems
Siamese neural networks
Software
Stack trace similarity
Title DeepLSH: Deep Locality-Sensitive Hash Learning for Fast and Efficient Near-Duplicate Crash Report Detection
URI https://ieeexplore.ieee.org/document/10548326
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwED7RioGpPIp4ywOrSxI7TsLahzpUVaWC1K2y4wsgpLRqU34_ZyeFLgxsViJblh3f9zl39x3AY6DjUKXCcI2p4tJkhhuRFDyXJhdoLGF46otNJNNpulhksyZZ3efCIKIPPsOea3pfvl3lO_erjE448WviGy1oJUlSJ2vtP56YcD850JZy5XCUdFylMcuKuH2j7RPK-EkQk44D0RME0Z7-HhRX8dgy6vxzVqfQ_c3SY7Mf_DmDIyzPobMv08CaU3sBnwPE9WQ-fmauwSYOvYh787mLXXfWjo319p01SqtvjGgsG-ltxXRp2dBLTNAE2JTe88Gu9ncj629cp5q_08CVD-kqu_A6Gr70x7ypscA13dQq7spHWmkjpyETEZRbFdgCozy2mAWisFKTQaAttMQztEwN3W-MthFmoVZFmBpxCe1yVeIVMEzo7hbYUOiYRjTKCGmoj0ZiXFGm7DV03eIt17WMxnK_bjd_PL-Fk4gYhHPdhPIO2tVmh_dwnH9VH9vNg9_8b4kErZc
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3JTsMwELWgIMGpLEXs-MDVJYmdjWsXBRGiSi1Sb5UdTwAhpVWb8v2MnRR64cDNSmTLcux5z5mZN4TcO9J3g4grJiEKmFCxYoqHBcuFyjkojRge2WITYZZF02k8apLVbS4MANjgM-iapvXl63m-Nr_K8IQjv0a-sUv2fCE8t07X2mwfH5E_3FKXMgVxAmHYSmOYA2T3jbqPK_wHjlzad3iXI0hbArxVXsWiy7D9z3kdkc5vnh4d_SDQMdmB8oS0N4UaaHNuT8lnH2CRjpNHaho0NfiF7JuNTfS6sXc0kat32mitvlEksnQoVxWVpaYDKzKBE6AZvmf9de3xBtpbmk41g8eBKxvUVXbI63Aw6SWsqbLAJN7VKmYKSGqhPaMi4yGY68DRBXi5ryF2eKGFRJOAH1Ej05AiUnjDUVJ7ELsyKNxI8TPSKuclnBMKId7eHO1y6eOIKlBcKOwjATmXFwf6gnTM4s0WtZDGbLNul388vyMHyeQlnaVP2fMVOfSQTxhHjiuuSataruGG7Odf1cdqeWs3wjfAqrDe
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=DeepLSH%3A+Deep+Locality-Sensitive+Hash+Learning+for+Fast+and+Efficient+Near-Duplicate+Crash+Report+Detection&rft.au=Remil%2C+Youcef&rft.au=Bendimerad%2C+Anes&rft.au=Mathonat%2C+Romain&rft.au=Raissi%2C+Chedy&rft.date=2024-04-14&rft.pub=ACM&rft.eissn=1558-1225&rft.spage=2445&rft.epage=2456&rft_id=info:doi/10.1145%2F3597503.3639146&rft.externalDocID=10548326