Experience Report: Log Mining Using Natural Language Processing and Application to Anomaly Detection
Event logging is a key source of information on a system state. Reading logs provides insights on its activity, assess its correct state and allows to diagnose problems. However, reading does not scale: with the number of machines increasingly rising, and the complexification of systems, the task of...
Saved in:
| Published in: | Proceedings - International Symposium on Software Reliability Engineering pp. 351 - 360 |
|---|---|
| Main Authors: | , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
01.10.2017
|
| Subjects: | |
| ISSN: | 2332-6549 |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Abstract | Event logging is a key source of information on a system state. Reading logs provides insights on its activity, assess its correct state and allows to diagnose problems. However, reading does not scale: with the number of machines increasingly rising, and the complexification of systems, the task of auditing systems' health based on logfiles is becoming overwhelming for system administrators. This observation led to many proposals automating the processing of logs. However, most of these proposal still require some human intervention, for instance by tagging logs, parsing the source files generating the logs, etc. In this work, we target minimal human intervention for logfile processing and propose a new approach that considers logs as regular text (as opposed to related works that seek to exploit at best the little structure imposed by log formatting). This approach allows to leverage modern techniques from natural language processing. More specifically, we first apply a word embedding technique based on Google's word2vec algorithm: logfiles' words are mapped to a high dimensional metric space, that we then exploit as a feature space using standard classifiers. The resulting pipeline is very generic, computationally efficient, and requires very little intervention. We validate our approach by seeking stress patterns on an experimental platform. Results show a strong predictive performance (≈ 90% accuracy) using three out-of-the-box classifiers. |
|---|---|
| AbstractList | Event logging is a key source of information on a system state. Reading logs provides insights on its activity, assess its correct state and allows to diagnose problems. However, reading does not scale: with the number of machines increasingly rising, and the complexification of systems, the task of auditing systems' health based on logfiles is becoming overwhelming for system administrators. This observation led to many proposals automating the processing of logs. However, most of these proposal still require some human intervention, for instance by tagging logs, parsing the source files generating the logs, etc. In this work, we target minimal human intervention for logfile processing and propose a new approach that considers logs as regular text (as opposed to related works that seek to exploit at best the little structure imposed by log formatting). This approach allows to leverage modern techniques from natural language processing. More specifically, we first apply a word embedding technique based on Google's word2vec algorithm: logfiles' words are mapped to a high dimensional metric space, that we then exploit as a feature space using standard classifiers. The resulting pipeline is very generic, computationally efficient, and requires very little intervention. We validate our approach by seeking stress patterns on an experimental platform. Results show a strong predictive performance (≈ 90% accuracy) using three out-of-the-box classifiers. |
| Author | Sauvanaud, Carla Tredan, Gilles Roy, Matthieu Bertero, Christophe |
| Author_xml | – sequence: 1 givenname: Christophe surname: Bertero fullname: Bertero, Christophe email: christophe.bertero@laas.fr organization: LAAS, Univ. de Toulouse, Toulouse, France – sequence: 2 givenname: Matthieu surname: Roy fullname: Roy, Matthieu email: matthieu.roy@laas.fr organization: LAAS, Univ. de Toulouse, Toulouse, France – sequence: 3 givenname: Carla surname: Sauvanaud fullname: Sauvanaud, Carla email: carla.sauvanaud@laas.fr organization: LAAS, Univ. de Toulouse, Toulouse, France – sequence: 4 givenname: Gilles surname: Tredan fullname: Tredan, Gilles email: gilles.tredan@laas.fr organization: LAAS, Univ. de Toulouse, Toulouse, France |
| BookMark | eNotUMlOwzAQNQgk2tIjJy7-gRRP7NQ2t6oUqBQWtfRcOfYkMkrtKEkl-veE5fK2Gb3DG5OLEAMScgNsBsD03Xq73axmKQM5E_yMTLVUkHE1Z1pAdk5GKedpMs-EviLjrvtkLGUC0hFxq68GW4_BIt1gE9v-nuaxoi8--FDRXfeDr6Y_tqamuQnV0VRI39tosfu9meDoomlqb03vY6B9pIsQD6Y-0Qfs0f6E1-SyNHWH03-ekN3j6mP5nORvT-vlIk88yKxPHAdeFKkDy510hWZMlIMpi0IhCLCytIPSSjglUc8zY6WSsuRSD4-Maz4ht3-9HhH3TesPpj3t1bAPMMa_ASVmV-M |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ISSRE.2017.43 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9781538609415 153860941X |
| EISSN | 2332-6549 |
| EndPage | 360 |
| ExternalDocumentID | 8109100 |
| Genre | orig-research |
| GroupedDBID | 23M 29G 29N 29O 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS |
| ID | FETCH-LOGICAL-i175t-d313bb2d1c3d7db9004fd1cfbb8e141c7fcb8e984d87e965ac7877f37904f0393 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 96 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000426939700033&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:37:03 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i175t-d313bb2d1c3d7db9004fd1cfbb8e141c7fcb8e984d87e965ac7877f37904f0393 |
| PageCount | 10 |
| ParticipantIDs | ieee_primary_8109100 |
| PublicationCentury | 2000 |
| PublicationDate | 2017-Oct. |
| PublicationDateYYYYMMDD | 2017-10-01 |
| PublicationDate_xml | – month: 10 year: 2017 text: 2017-Oct. |
| PublicationDecade | 2010 |
| PublicationTitle | Proceedings - International Symposium on Software Reliability Engineering |
| PublicationTitleAbbrev | ISSRE |
| PublicationYear | 2017 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0020412 |
| Score | 2.4372864 |
| Snippet | Event logging is a key source of information on a system state. Reading logs provides insights on its activity, assess its correct state and allows to diagnose... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 351 |
| SubjectTerms | Anomaly detection logfile machine learning Memory management Natural language processing NLP Servers Stress Training VNF word2vec |
| Title | Experience Report: Log Mining Using Natural Language Processing and Application to Anomaly Detection |
| URI | https://ieeexplore.ieee.org/document/8109100 |
| WOSCitedRecordID | wos000426939700033&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA5zePDkj038zTt4tNu6ZCb1NnRDYY7hVHYbSV4iA21ldoL_vUlaVw9evCVtoJD0kXx53_c9Qs4ZMqq6pud-Xqoiplk3cljFRDzRVFqkHQzcnOcRH4_FbJZMauRirYUxxgTymWn5ZsjlY6ZX_qqsLbyLZccB9A3OeaHVWoMr7xtVeWi276bTh4FnbvGWF-T8qpwSNo7h9v8-uUOalQIPJuu9ZZfUTLpHtn9KMEAZkQ2ClVUxFGfpKxhlL3Af6j5AIATAWAZzDRiVd5NQqgP8O5ki9KskNuQZ9NPsTb5-wY3JA08rbZKn4eDx-jYqCydEC3cayCOkMVWqi7GmyFElLhCs61ilhIlZrLnVrpUIhoKb5LIntQtbbilP3EAv1t0n9TRLzQEBkQjR0zFKZh1QQ3QAzTjQ5fPWljKUh6ThJ23-XnhjzMv5Ovr78THZ8ktSkOFOSD1frswp2dSf-eJjeRYW9Bt8-aOz |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwGG0ImugJFYy_7cGjA7Z2tvNGFAJxLETQcCNbv9aQ6GZwmPjf23aTefDird2aLGn3pX393nsfQlcUKEk86euflyQOFdRzNFaRDgsEiRWQLlhuznPIoojP58Gkhq43WhgppSWfybZp2lw-ZGJtrso63LhYdjVA3_Ip9dxCrbWBV8Y5qnLR7Iym08e-4W6xtpHk_KqdYreOQeN_H91DrUqDhyeb3WUf1WR6gBo_RRhwGZNNBJVZMS5O07c4zF7w2FZ-wJYSgKPY2mvgsLydxKU-wLyLU8C9Ko2N8wz30uwtfv3C9zK3TK20hZ4G_dnd0ClLJzhLfR7IHSAuSRIPXEGAQRLoUFC6o5KES5e6gimhWwGnwJkMbvxY6MBlirBADzRy3UNUT7NUHiHMA8594UJMlYZqABqiSQ27TOZaEQrxMWqaSVu8F-4Yi3K-Tv5-fIl2hrNxuAhH0cMp2jXLU1DjzlA9X63lOdoWn_nyY3VhF_cbLL2m-g |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+Software+Reliability+Engineering&rft.atitle=Experience+Report%3A+Log+Mining+Using+Natural+Language+Processing+and+Application+to+Anomaly+Detection&rft.au=Bertero%2C+Christophe&rft.au=Roy%2C+Matthieu&rft.au=Sauvanaud%2C+Carla&rft.au=Tredan%2C+Gilles&rft.date=2017-10-01&rft.pub=IEEE&rft.eissn=2332-6549&rft.spage=351&rft.epage=360&rft_id=info:doi/10.1109%2FISSRE.2017.43&rft.externalDocID=8109100 |