Detection of Duplicate Defect Reports Using Natural Language Processing

Defect reports are generated from various testing and development activities in software engineering. Sometimes two reports are submitted that describe the same problem, leading to duplicate reports. These reports are mostly written in structured natural language, and as such, it is hard to compare...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:29th International Conference on Software Engineering (ICSE'07) S. 499 - 510
Hauptverfasser: Runeson, P., Alexandersson, M., Nyholm, O.
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: IEEE 01.01.2007
Schlagworte:
ISBN:9780769528281, 0769528287
ISSN:0270-5257
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract Defect reports are generated from various testing and development activities in software engineering. Sometimes two reports are submitted that describe the same problem, leading to duplicate reports. These reports are mostly written in structured natural language, and as such, it is hard to compare two reports for similarity with formal methods. In order to identify duplicates, we investigate using natural language processing (NLP) techniques to support the identification. A prototype tool is developed and evaluated in a case study analyzing defect reports at Sony Ericsson mobile communications. The evaluation shows that about 2/3 of the duplicates can possibly be found using the NLP techniques. Different variants of the techniques provide only minor result differences, indicating a robust technology. User testing shows that the overall attitude towards the technique is positive and that it has a growth potential.
AbstractList Defect reports are generated from various testing and development activities in software engineering. Sometimes two reports are submitted that describe the same problem, leading to duplicate reports. These reports are mostly written in structured natural language, and as such, it is hard to compare two reports for similarity with formal methods. In order to identify duplicates, we investigate using Natural Language Processing (NLP) techniques to support the identification. A prototype tool is developed and evaluated in a case study analyzing defect reports at Sony Ericsson Mobile Communications. The evaluation shows that about 2/3 of the duplicates can possibly be found using the NLP techniques. Different variants of the techniques provide only minor result differences, indicating a robust technology. User testing shows that the overall attitude towards the technique is positive and that it has a growth potential.
Author Nyholm, O.
Alexandersson, M.
Runeson, P.
Author_xml – sequence: 1
  givenname: P.
  surname: Runeson
  fullname: Runeson, P.
  organization: Software Eng. Res. Group, Lund Univ., Lund
– sequence: 2
  givenname: M.
  surname: Alexandersson
  fullname: Alexandersson, M.
  organization: Software Eng. Res. Group, Lund Univ., Lund
– sequence: 3
  givenname: O.
  surname: Nyholm
  fullname: Nyholm, O.
  organization: Software Eng. Res. Group, Lund Univ., Lund
BookMark eNotjDFPwzAUhC1RJErpxsbiiS3Ffo5je0RtKZUqQEDn6DV9roLSOMTOwL-nqNxy0n2f7pqN2tASY7dSzKQU7mE9_1jOQAgzU3DBps5YYQqnwYKVIzYWYESmQZsrNo3xS5yiXKGcHLPVghJVqQ4tD54vhq6pK0zEF-RPM3-nLvQp8m2s2wN_wTT02PANtocBD8Tf-lBR_GM37NJjE2n63xO2fVp-zp-zzetqPX_cZJiDTVkBYq-RKPcIMs-9tlY70HsjHHilURTeo7eGKhRkC9zjToHQStIOpfJWTdj9-bfrw_dAMZXHOlbUNNhSGGKppMyNVPIk3p3FmojKrq-P2P-UOQAUUqpfPStaug
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/ICSE.2007.32
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList Computer and Information Systems Abstracts

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 510
ExternalDocumentID 4222611
Genre orig-research
Conference Paper
GroupedDBID -~X
.4S
.DC
123
23M
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
AFFNX
ALMA_UNASSIGNED_HOLDINGS
APO
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
FEDTE
I-F
I07
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
XOL
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-a428t-620d5aee4fa2144f5885925d7092f35a06ffaf87eca0e86adab320531eba13f83
IEDL.DBID RIE
ISBN 9780769528281
0769528287
ISICitedReferencesCount 311
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000247063000049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
ISSN 0270-5257
IngestDate Fri Sep 05 11:31:25 EDT 2025
Wed Aug 27 01:48:38 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a428t-620d5aee4fa2144f5885925d7092f35a06ffaf87eca0e86adab320531eba13f83
Notes SourceType-Conference Papers & Proceedings-1
ObjectType-Conference Paper-1
content type line 25
PQID 31147131
PQPubID 23500
PageCount 12
ParticipantIDs proquest_miscellaneous_31147131
ieee_primary_4222611
PublicationCentury 2000
PublicationDate 2007-01-01
PublicationDateYYYYMMDD 2007-01-01
PublicationDate_xml – month: 01
  year: 2007
  text: 2007-01-01
  day: 01
PublicationDecade 2000
PublicationTitle 29th International Conference on Software Engineering (ICSE'07)
PublicationTitleAbbrev ICSE
PublicationYear 2007
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000396391
ssj0006499
Score 2.4019573
Snippet Defect reports are generated from various testing and development activities in software engineering. Sometimes two reports are submitted that describe the...
SourceID proquest
ieee
SourceType Aggregation Database
Publisher
StartPage 499
SubjectTerms Failure analysis
Mobile communication
Natural language processing
Natural languages
Prototypes
Relays
Robustness
Software engineering
Software testing
Vocabulary
Title Detection of Duplicate Defect Reports Using Natural Language Processing
URI https://ieeexplore.ieee.org/document/4222611
https://www.proquest.com/docview/31147131
WOSCitedRecordID wos000247063000049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZKxcBUoEW88cBIaBInfsx9AFJVVeKhbtElOUssCWoTfj-2k6YDLGx5StH5HPu7--47Qu4zQJTc7NxAQuRF5hkvNZe8TPEUMpRauP4pHwuxXMr1Wq165KGrhUFERz7DR3vocvl5mdU2VDa24QpuC3kPhOBNrVYXT_GZcSW1p3fwyPWONKjL96ziZwPZVWwhhmiVd3bnQceIV-OXyeusUTa0HUlcx5Vfv2m39swH__vqYzLaF_HRVbc8nZAeFqdksOviQNtJPSRPU6wcH6ugpabTukloI52iZXrQZoe-pY5bQJfgdDroog1z0rbQwNwbkff57G3y7LXtFTwwmKPyeOjnsRmXSIPVTdOxlLEK41z4KtQsBp9rDVoKzMA34wk5pCy0cxZTCJiW7Iz0i7LAc0I1iEjlAQgGPNIRhzizMjtMCD8HlfoXZGjtknw1ChpJa5ILcrczbGK82qYqoMCy3ibMwDQDn4PLv1-8IkdNjNWGQq5Jv9rUeEMOs-_qc7u5dY7xA1zBs34
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELaqggRTgRZRXvXASKiTOHE890ErQlSJgrpFl8SWWBLUJvx-bCdNB1jY8pSi8zn2d_fddwg9pCBE4KudGwRALaqesRJ1yUq5n0AqAslM_5SPkEVRsNnwVQc9trUwQghDPhNP-tDk8rMirXSobKzDFb4u5D3yKHVIXa3VRlSIq5yJHwgePjXdIxXuIpbW_KxBO_c0yGCN9s7-3G458Xy8nLzNam1D3ZPE9Fz59aM2q8-897_vPkODQxkfXrUL1DnqiPwC9fZ9HHAzrfvoeSpKw8jKcSHxtKpT2gJPheZ64HqPvsOGXYAjMEodOGwCnbgpNVD3Buh9PltPFlbTYMEChTpKy3dI5qmRoRK0cpr0gsDjjpcxwh3pekB8KUEGTKRA1IhCBonr6FkrErBdGbiXqJsXubhCWAKjPLOBueBTSX3wUi204zJGMuAJGaK-tkv8VWtoxI1Jhmi0N2ys_FonKyAXRbWLXQXUFIC2r_9-cYROFuvXMA6X0csNOq0jrjowcou65bYSd-g4_S4_d9t74yQ_5zG2xQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=29th+International+Conference+on+Software+Engineering+%28ICSE%2707%29&rft.atitle=Detection+of+Duplicate+Defect+Reports+Using+Natural+Language+Processing&rft.au=Runeson%2C+P.&rft.au=Alexandersson%2C+M.&rft.au=Nyholm%2C+O.&rft.date=2007-01-01&rft.pub=IEEE&rft.isbn=9780769528281&rft.issn=0270-5257&rft.spage=499&rft.epage=510&rft_id=info:doi/10.1109%2FICSE.2007.32&rft.externalDocID=4222611
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0270-5257&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0270-5257&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0270-5257&client=summon