Detection of Duplicate Defect Reports Using Natural Language Processing
Defect reports are generated from various testing and development activities in software engineering. Sometimes two reports are submitted that describe the same problem, leading to duplicate reports. These reports are mostly written in structured natural language, and as such, it is hard to compare...
Gespeichert in:
| Veröffentlicht in: | 29th International Conference on Software Engineering (ICSE'07) S. 499 - 510 |
|---|---|
| Hauptverfasser: | , , |
| Format: | Tagungsbericht |
| Sprache: | Englisch |
| Veröffentlicht: |
IEEE
01.01.2007
|
| Schlagworte: | |
| ISBN: | 9780769528281, 0769528287 |
| ISSN: | 0270-5257 |
| Online-Zugang: | Volltext |
| Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
| Abstract | Defect reports are generated from various testing and development activities in software engineering. Sometimes two reports are submitted that describe the same problem, leading to duplicate reports. These reports are mostly written in structured natural language, and as such, it is hard to compare two reports for similarity with formal methods. In order to identify duplicates, we investigate using natural language processing (NLP) techniques to support the identification. A prototype tool is developed and evaluated in a case study analyzing defect reports at Sony Ericsson mobile communications. The evaluation shows that about 2/3 of the duplicates can possibly be found using the NLP techniques. Different variants of the techniques provide only minor result differences, indicating a robust technology. User testing shows that the overall attitude towards the technique is positive and that it has a growth potential. |
|---|---|
| AbstractList | Defect reports are generated from various testing and development activities in software engineering. Sometimes two reports are submitted that describe the same problem, leading to duplicate reports. These reports are mostly written in structured natural language, and as such, it is hard to compare two reports for similarity with formal methods. In order to identify duplicates, we investigate using Natural Language Processing (NLP) techniques to support the identification. A prototype tool is developed and evaluated in a case study analyzing defect reports at Sony Ericsson Mobile Communications. The evaluation shows that about 2/3 of the duplicates can possibly be found using the NLP techniques. Different variants of the techniques provide only minor result differences, indicating a robust technology. User testing shows that the overall attitude towards the technique is positive and that it has a growth potential. |
| Author | Nyholm, O. Alexandersson, M. Runeson, P. |
| Author_xml | – sequence: 1 givenname: P. surname: Runeson fullname: Runeson, P. organization: Software Eng. Res. Group, Lund Univ., Lund – sequence: 2 givenname: M. surname: Alexandersson fullname: Alexandersson, M. organization: Software Eng. Res. Group, Lund Univ., Lund – sequence: 3 givenname: O. surname: Nyholm fullname: Nyholm, O. organization: Software Eng. Res. Group, Lund Univ., Lund |
| BookMark | eNotjDFPwzAUhC1RJErpxsbiiS3Ffo5je0RtKZUqQEDn6DV9roLSOMTOwL-nqNxy0n2f7pqN2tASY7dSzKQU7mE9_1jOQAgzU3DBps5YYQqnwYKVIzYWYESmQZsrNo3xS5yiXKGcHLPVghJVqQ4tD54vhq6pK0zEF-RPM3-nLvQp8m2s2wN_wTT02PANtocBD8Tf-lBR_GM37NJjE2n63xO2fVp-zp-zzetqPX_cZJiDTVkBYq-RKPcIMs-9tlY70HsjHHilURTeo7eGKhRkC9zjToHQStIOpfJWTdj9-bfrw_dAMZXHOlbUNNhSGGKppMyNVPIk3p3FmojKrq-P2P-UOQAUUqpfPStaug |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IH CBEJK RIE RIO 7SC 8FD JQ2 L7M L~C L~D |
| DOI | 10.1109/ICSE.2007.32 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
| DatabaseTitle | Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
| DatabaseTitleList | Computer and Information Systems Abstracts |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EndPage | 510 |
| ExternalDocumentID | 4222611 |
| Genre | orig-research Conference Paper |
| GroupedDBID | -~X .4S .DC 123 23M 29O 5VS 6IE 6IF 6IH 6IK 6IL 6IM 6IN 8US AAJGR AAWTH ABLEC ADZIZ AFFNX ALMA_UNASSIGNED_HOLDINGS APO ARCSS AVWKF BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO EDO FEDTE I-F I07 IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS XOL 7SC 8FD JQ2 L7M L~C L~D |
| ID | FETCH-LOGICAL-a428t-620d5aee4fa2144f5885925d7092f35a06ffaf87eca0e86adab320531eba13f83 |
| IEDL.DBID | RIE |
| ISBN | 9780769528281 0769528287 |
| ISICitedReferencesCount | 311 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000247063000049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| ISSN | 0270-5257 |
| IngestDate | Fri Sep 05 11:31:25 EDT 2025 Wed Aug 27 01:48:38 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a428t-620d5aee4fa2144f5885925d7092f35a06ffaf87eca0e86adab320531eba13f83 |
| Notes | SourceType-Conference Papers & Proceedings-1 ObjectType-Conference Paper-1 content type line 25 |
| PQID | 31147131 |
| PQPubID | 23500 |
| PageCount | 12 |
| ParticipantIDs | proquest_miscellaneous_31147131 ieee_primary_4222611 |
| PublicationCentury | 2000 |
| PublicationDate | 2007-01-01 |
| PublicationDateYYYYMMDD | 2007-01-01 |
| PublicationDate_xml | – month: 01 year: 2007 text: 2007-01-01 day: 01 |
| PublicationDecade | 2000 |
| PublicationTitle | 29th International Conference on Software Engineering (ICSE'07) |
| PublicationTitleAbbrev | ICSE |
| PublicationYear | 2007 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0000396391 ssj0006499 |
| Score | 2.4019573 |
| Snippet | Defect reports are generated from various testing and development activities in software engineering. Sometimes two reports are submitted that describe the... |
| SourceID | proquest ieee |
| SourceType | Aggregation Database Publisher |
| StartPage | 499 |
| SubjectTerms | Failure analysis Mobile communication Natural language processing Natural languages Prototypes Relays Robustness Software engineering Software testing Vocabulary |
| Title | Detection of Duplicate Defect Reports Using Natural Language Processing |
| URI | https://ieeexplore.ieee.org/document/4222611 https://www.proquest.com/docview/31147131 |
| WOSCitedRecordID | wos000247063000049&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELZKxcBUoEW88cBIaBInfsx9AFJVVeKhbtElOUssCWoTfj-2k6YDLGx5StH5HPu7--47Qu4zQJTc7NxAQuRF5hkvNZe8TPEUMpRauP4pHwuxXMr1Wq165KGrhUFERz7DR3vocvl5mdU2VDa24QpuC3kPhOBNrVYXT_GZcSW1p3fwyPWONKjL96ziZwPZVWwhhmiVd3bnQceIV-OXyeusUTa0HUlcx5Vfv2m39swH__vqYzLaF_HRVbc8nZAeFqdksOviQNtJPSRPU6wcH6ugpabTukloI52iZXrQZoe-pY5bQJfgdDroog1z0rbQwNwbkff57G3y7LXtFTwwmKPyeOjnsRmXSIPVTdOxlLEK41z4KtQsBp9rDVoKzMA34wk5pCy0cxZTCJiW7Iz0i7LAc0I1iEjlAQgGPNIRhzizMjtMCD8HlfoXZGjtknw1ChpJa5ILcrczbGK82qYqoMCy3ibMwDQDn4PLv1-8IkdNjNWGQq5Jv9rUeEMOs-_qc7u5dY7xA1zBs34 |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV07T8MwELaqggRTgRZRXvXASKiTOHE890ErQlSJgrpFl8SWWBLUJvx-bCdNB1jY8pSi8zn2d_fddwg9pCBE4KudGwRALaqesRJ1yUq5n0AqAslM_5SPkEVRsNnwVQc9trUwQghDPhNP-tDk8rMirXSobKzDFb4u5D3yKHVIXa3VRlSIq5yJHwgePjXdIxXuIpbW_KxBO_c0yGCN9s7-3G458Xy8nLzNam1D3ZPE9Fz59aM2q8-897_vPkODQxkfXrUL1DnqiPwC9fZ9HHAzrfvoeSpKw8jKcSHxtKpT2gJPheZ64HqPvsOGXYAjMEodOGwCnbgpNVD3Buh9PltPFlbTYMEChTpKy3dI5qmRoRK0cpr0gsDjjpcxwh3pekB8KUEGTKRA1IhCBonr6FkrErBdGbiXqJsXubhCWAKjPLOBueBTSX3wUi204zJGMuAJGaK-tkv8VWtoxI1Jhmi0N2ys_FonKyAXRbWLXQXUFIC2r_9-cYROFuvXMA6X0csNOq0jrjowcou65bYSd-g4_S4_d9t74yQ_5zG2xQ |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=29th+International+Conference+on+Software+Engineering+%28ICSE%2707%29&rft.atitle=Detection+of+Duplicate+Defect+Reports+Using+Natural+Language+Processing&rft.au=Runeson%2C+P.&rft.au=Alexandersson%2C+M.&rft.au=Nyholm%2C+O.&rft.date=2007-01-01&rft.pub=IEEE&rft.isbn=9780769528281&rft.issn=0270-5257&rft.spage=499&rft.epage=510&rft_id=info:doi/10.1109%2FICSE.2007.32&rft.externalDocID=4222611 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0270-5257&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0270-5257&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0270-5257&client=summon |

