Experience Report: Log Mining Using Natural Language Processing and Application to Anomaly Detection

Event logging is a key source of information on a system state. Reading logs provides insights on its activity, assess its correct state and allows to diagnose problems. However, reading does not scale: with the number of machines increasingly rising, and the complexification of systems, the task of...

Full description

Saved in:
Bibliographic Details
Published in:Proceedings - International Symposium on Software Reliability Engineering pp. 351 - 360
Main Authors: Bertero, Christophe, Roy, Matthieu, Sauvanaud, Carla, Tredan, Gilles
Format: Conference Proceeding
Language:English
Published: IEEE 01.10.2017
Subjects:
ISSN:2332-6549
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Abstract Event logging is a key source of information on a system state. Reading logs provides insights on its activity, assess its correct state and allows to diagnose problems. However, reading does not scale: with the number of machines increasingly rising, and the complexification of systems, the task of auditing systems' health based on logfiles is becoming overwhelming for system administrators. This observation led to many proposals automating the processing of logs. However, most of these proposal still require some human intervention, for instance by tagging logs, parsing the source files generating the logs, etc. In this work, we target minimal human intervention for logfile processing and propose a new approach that considers logs as regular text (as opposed to related works that seek to exploit at best the little structure imposed by log formatting). This approach allows to leverage modern techniques from natural language processing. More specifically, we first apply a word embedding technique based on Google's word2vec algorithm: logfiles' words are mapped to a high dimensional metric space, that we then exploit as a feature space using standard classifiers. The resulting pipeline is very generic, computationally efficient, and requires very little intervention. We validate our approach by seeking stress patterns on an experimental platform. Results show a strong predictive performance (≈ 90% accuracy) using three out-of-the-box classifiers.
AbstractList Event logging is a key source of information on a system state. Reading logs provides insights on its activity, assess its correct state and allows to diagnose problems. However, reading does not scale: with the number of machines increasingly rising, and the complexification of systems, the task of auditing systems' health based on logfiles is becoming overwhelming for system administrators. This observation led to many proposals automating the processing of logs. However, most of these proposal still require some human intervention, for instance by tagging logs, parsing the source files generating the logs, etc. In this work, we target minimal human intervention for logfile processing and propose a new approach that considers logs as regular text (as opposed to related works that seek to exploit at best the little structure imposed by log formatting). This approach allows to leverage modern techniques from natural language processing. More specifically, we first apply a word embedding technique based on Google's word2vec algorithm: logfiles' words are mapped to a high dimensional metric space, that we then exploit as a feature space using standard classifiers. The resulting pipeline is very generic, computationally efficient, and requires very little intervention. We validate our approach by seeking stress patterns on an experimental platform. Results show a strong predictive performance (≈ 90% accuracy) using three out-of-the-box classifiers.
Author Sauvanaud, Carla
Tredan, Gilles
Roy, Matthieu
Bertero, Christophe
Author_xml – sequence: 1
  givenname: Christophe
  surname: Bertero
  fullname: Bertero, Christophe
  email: christophe.bertero@laas.fr
  organization: LAAS, Univ. de Toulouse, Toulouse, France
– sequence: 2
  givenname: Matthieu
  surname: Roy
  fullname: Roy, Matthieu
  email: matthieu.roy@laas.fr
  organization: LAAS, Univ. de Toulouse, Toulouse, France
– sequence: 3
  givenname: Carla
  surname: Sauvanaud
  fullname: Sauvanaud, Carla
  email: carla.sauvanaud@laas.fr
  organization: LAAS, Univ. de Toulouse, Toulouse, France
– sequence: 4
  givenname: Gilles
  surname: Tredan
  fullname: Tredan, Gilles
  email: gilles.tredan@laas.fr
  organization: LAAS, Univ. de Toulouse, Toulouse, France
BookMark eNotUMlOwzAQNQgk2tIjJy7-gRRP7NQ2t6oUqBQWtfRcOfYkMkrtKEkl-veE5fK2Gb3DG5OLEAMScgNsBsD03Xq73axmKQM5E_yMTLVUkHE1Z1pAdk5GKedpMs-EviLjrvtkLGUC0hFxq68GW4_BIt1gE9v-nuaxoi8--FDRXfeDr6Y_tqamuQnV0VRI39tosfu9meDoomlqb03vY6B9pIsQD6Y-0Qfs0f6E1-SyNHWH03-ekN3j6mP5nORvT-vlIk88yKxPHAdeFKkDy510hWZMlIMpi0IhCLCytIPSSjglUc8zY6WSsuRSD4-Maz4ht3-9HhH3TesPpj3t1bAPMMa_ASVmV-M
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ISSRE.2017.43
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781538609415
153860941X
EISSN 2332-6549
EndPage 360
ExternalDocumentID 8109100
Genre orig-research
GroupedDBID 23M
29G
29N
29O
6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i175t-d313bb2d1c3d7db9004fd1cfbb8e141c7fcb8e984d87e965ac7877f37904f0393
IEDL.DBID RIE
ISICitedReferencesCount 96
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000426939700033&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:37:03 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-d313bb2d1c3d7db9004fd1cfbb8e141c7fcb8e984d87e965ac7877f37904f0393
PageCount 10
ParticipantIDs ieee_primary_8109100
PublicationCentury 2000
PublicationDate 2017-Oct.
PublicationDateYYYYMMDD 2017-10-01
PublicationDate_xml – month: 10
  year: 2017
  text: 2017-Oct.
PublicationDecade 2010
PublicationTitle Proceedings - International Symposium on Software Reliability Engineering
PublicationTitleAbbrev ISSRE
PublicationYear 2017
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020412
Score 2.4372864
Snippet Event logging is a key source of information on a system state. Reading logs provides insights on its activity, assess its correct state and allows to diagnose...
SourceID ieee
SourceType Publisher
StartPage 351
SubjectTerms Anomaly detection
logfile
machine learning
Memory management
Natural language processing
NLP
Servers
Stress
Training
VNF
word2vec
Title Experience Report: Log Mining Using Natural Language Processing and Application to Anomaly Detection
URI https://ieeexplore.ieee.org/document/8109100
WOSCitedRecordID wos000426939700033&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PS8MwFA5zePDkj038zTt4tNu6ZCb1NnRDYY7hVHYbSV4iA21ldoL_vUlaVw9evCVtoJD0kXx53_c9Qs4ZMqq6pud-Xqoiplk3cljFRDzRVFqkHQzcnOcRH4_FbJZMauRirYUxxgTymWn5ZsjlY6ZX_qqsLbyLZccB9A3OeaHVWoMr7xtVeWi276bTh4FnbvGWF-T8qpwSNo7h9v8-uUOalQIPJuu9ZZfUTLpHtn9KMEAZkQ2ClVUxFGfpKxhlL3Af6j5AIATAWAZzDRiVd5NQqgP8O5ki9KskNuQZ9NPsTb5-wY3JA08rbZKn4eDx-jYqCydEC3cayCOkMVWqi7GmyFElLhCs61ilhIlZrLnVrpUIhoKb5LIntQtbbilP3EAv1t0n9TRLzQEBkQjR0zFKZh1QQ3QAzTjQ5fPWljKUh6ThJ23-XnhjzMv5Ovr78THZ8ktSkOFOSD1frswp2dSf-eJjeRYW9Bt8-aOz
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3PT8IwGG0ImugJFYy_7cGjA7Z2tvNGFAJxLETQcCNbv9aQ6GZwmPjf23aTefDird2aLGn3pX393nsfQlcUKEk86euflyQOFdRzNFaRDgsEiRWQLlhuznPIoojP58Gkhq43WhgppSWfybZp2lw-ZGJtrso63LhYdjVA3_Ip9dxCrbWBV8Y5qnLR7Iym08e-4W6xtpHk_KqdYreOQeN_H91DrUqDhyeb3WUf1WR6gBo_RRhwGZNNBJVZMS5O07c4zF7w2FZ-wJYSgKPY2mvgsLydxKU-wLyLU8C9Ko2N8wz30uwtfv3C9zK3TK20hZ4G_dnd0ClLJzhLfR7IHSAuSRIPXEGAQRLoUFC6o5KES5e6gimhWwGnwJkMbvxY6MBlirBADzRy3UNUT7NUHiHMA8594UJMlYZqABqiSQ27TOZaEQrxMWqaSVu8F-4Yi3K-Tv5-fIl2hrNxuAhH0cMp2jXLU1DjzlA9X63lOdoWn_nyY3VhF_cbLL2m-g
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+Software+Reliability+Engineering&rft.atitle=Experience+Report%3A+Log+Mining+Using+Natural+Language+Processing+and+Application+to+Anomaly+Detection&rft.au=Bertero%2C+Christophe&rft.au=Roy%2C+Matthieu&rft.au=Sauvanaud%2C+Carla&rft.au=Tredan%2C+Gilles&rft.date=2017-10-01&rft.pub=IEEE&rft.eissn=2332-6549&rft.spage=351&rft.epage=360&rft_id=info:doi/10.1109%2FISSRE.2017.43&rft.externalDocID=8109100