Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics

Logs have been widely adopted in software system development and maintenance because of the rich runtime information they record. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effec...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:Proceedings - International Symposium on Software Reliability Engineering s. 355 - 366
Hlavní autoři: Zhu, Jieming, He, Shilin, He, Pinjia, Liu, Jinyang, Lyu, Michael R.
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 09.10.2023
Témata:
ISSN:2332-6549
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Logs have been widely adopted in software system development and maintenance because of the rich runtime information they record. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effectively, a line of research focuses on developing intelligent and automated log analysis techniques. However, only a few of these techniques have reached successful deployments in industry due to the lack of public log datasets and open benchmarking upon them. To fill this significant gap and facilitate more research on AI-driven log analytics, we have collected and released loghub, a large collection of system log datasets. In particular, loghub provides 19 real-world log datasets collected from a wide range of software systems, including distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. In this paper, we summarize the statistics of these datasets, introduce some practical usage scenarios of the loghub datasets, and present our benchmarking results on loghub to benefit the researchers and practitioners in this field. Up to the time of this paper writing, the loghub datasets have been downloaded for roughly 90,000 times in total by hundreds of organizations from both industry and academia. The loghub datasets are available at https://github.com/logpai/loghub.
AbstractList Logs have been widely adopted in software system development and maintenance because of the rich runtime information they record. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effectively, a line of research focuses on developing intelligent and automated log analysis techniques. However, only a few of these techniques have reached successful deployments in industry due to the lack of public log datasets and open benchmarking upon them. To fill this significant gap and facilitate more research on AI-driven log analytics, we have collected and released loghub, a large collection of system log datasets. In particular, loghub provides 19 real-world log datasets collected from a wide range of software systems, including distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. In this paper, we summarize the statistics of these datasets, introduce some practical usage scenarios of the loghub datasets, and present our benchmarking results on loghub to benefit the researchers and practitioners in this field. Up to the time of this paper writing, the loghub datasets have been downloaded for roughly 90,000 times in total by hundreds of organizations from both industry and academia. The loghub datasets are available at https://github.com/logpai/loghub.
Author Liu, Jinyang
He, Shilin
Zhu, Jieming
Lyu, Michael R.
He, Pinjia
Author_xml – sequence: 1
  givenname: Jieming
  surname: Zhu
  fullname: Zhu, Jieming
  email: jiemingzhu@ieee.org
  organization: The Chinese University of Hong Kong, (CUHK Shenzhen),School of Data Science,Shenzhen,China
– sequence: 2
  givenname: Shilin
  surname: He
  fullname: He, Shilin
  email: slhe@link.cuhk.edu.hk
  organization: The Chinese University of Hong Kong, (CUHK Shenzhen),School of Data Science,Shenzhen,China
– sequence: 3
  givenname: Pinjia
  surname: He
  fullname: He, Pinjia
  email: hepinjia@cuhk.edu.cn
  organization: The Chinese University of Hong Kong, (CUHK Shenzhen),School of Data Science,Shenzhen,China
– sequence: 4
  givenname: Jinyang
  surname: Liu
  fullname: Liu, Jinyang
  email: jyliu@cse.cuhk.edu.hk
  organization: The Chinese University of Hong Kong,Department of Computer Science and Engineering,China
– sequence: 5
  givenname: Michael R.
  surname: Lyu
  fullname: Lyu, Michael R.
  email: lyu@cse.cuhk.edu.hk
  organization: The Chinese University of Hong Kong,Department of Computer Science and Engineering,China
BookMark eNotjlFLwzAUhaMouM39A4X8gc57b5I28a3MqYWK4PR53HbJrHStNFXYv3eoT-fhfJzvTMVZ13deiGuEBSK4m2K9flkZZ7VdEJBaAECGJ2LuMmeVAYXGaXUqJqQUJanR7kJMY_wAINBIE_FU9rv3r-pW5rLkYeflsm9bX49N38k-yPUhjn4vj5C845GjH6MM_SDzItkOzbfvfqu84_YwNnW8FOeB2-jn_zkTb_er1-VjUj4_FMu8TJqjdkycrwnSis02KG2JUTGnFdaINeuAJqWAyh4vsiKotK_IVlkwxKnFrclAzcTV327jvd98Ds2eh8MGQQGSydQPk-BPKA
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ISSRE59848.2023.00071
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350315943
EISSN 2332-6549
EndPage 366
ExternalDocumentID 10301257
Genre orig-research
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i204t-9ec206ba5df3482a13aa6b1c11ca4f1562f138020a320b4eb28b7f52a681d5703
IEDL.DBID RIE
ISICitedReferencesCount 72
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001096886300032&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:30:35 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i204t-9ec206ba5df3482a13aa6b1c11ca4f1562f138020a320b4eb28b7f52a681d5703
PageCount 12
ParticipantIDs ieee_primary_10301257
PublicationCentury 2000
PublicationDate 2023-Oct.-9
PublicationDateYYYYMMDD 2023-10-09
PublicationDate_xml – month: 10
  year: 2023
  text: 2023-Oct.-9
  day: 09
PublicationDecade 2020
PublicationTitle Proceedings - International Symposium on Software Reliability Engineering
PublicationTitleAbbrev ISSRE
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0020412
Score 2.5742836
Snippet Logs have been widely adopted in software system development and maintenance because of the rich runtime information they record. In recent years, the increase...
SourceID ieee
SourceType Publisher
StartPage 355
SubjectTerms anomaly detection
Benchmark testing
benchmarks
Industries
log analytics
Log datasets
log intelligence
Operating systems
Organizations
Runtime
Software systems
Writing
Title Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics
URI https://ieeexplore.ieee.org/document/10301257
WOSCitedRecordID wos001096886300032&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09b8IwED0V1KET_aDqtzx0TYmdBCfdUAsqUotQaSU2dHZsiQUqCP399R0pTB26RYmlKOc478659x7AvbTaOSLH6FKmUWp0FhXhPYpISq2MSy_R5mw2oUejfDotxjVZnbkwzjluPnMPdMj_8sul3dBWWYcssQIg6wY0tNZbstauuiLhqJqiI-OiM5xM3vtZkXP_liIZ05hp8nsLFUaQQeuf9z6G9p6LJ8Y7lDmBA7c4hdavGYOo1-YZvLH5rnkUPfFK3d2CtwSYtSCWXmyVyUUYJJ6xCtBVrUXIV0VvGJUr-uLxJZYoIeHmNnwO-h9PL1HtlRDNw5NXUeGsirsGs9KTXA3KBLFrpJXSYupDkaa8TPIQJUxUbNJQT-dG-0xhNySspMJ1Ds3FcuEuQBgbMMpI4zyGdMMnJnE-ReUx8x4NqktoU3hmX1s5jNlvZK7-OH8NRzQD3AFX3ECzWm3cLRza72q-Xt3xJP4AbWic4A
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09b8IwED21tFI70Q-qftdDV5fYSUjSDbUgUAGhQiU2dHZsiQUqCP399ZkAU4duUWIpyjnOu3PuvQfwLHRiDJFjklxEPFJJzDP3HnGSUsuD3ArUqTebSAaDdDLJhiVZ3XNhjDG--cy80KH_l58v9Jq2yupkieUAOTmEoziKpNjQtXb1FUlHlSQdEWT17mj02Yqz1HdwSRIyDTxRfm-i4jGkXf3n3c-gtmfjseEOZ87hwMwvoLq1Y2Dl6ryEvrffVa-syXrU3838poDnLbCFZRttcuYGsXcsHHgVK-YyVtbs8nxJ3zx_yYuUkHRzDb7arfFbh5duCXzmnrzgmdEyaCiMc0uCNShCxIYSWgiNkXVlmrQiTF2UMJSBilxFnarExhIbLmUlHa4rqMwXc3MNTGmHUkooY9ElHDZUobERSouxtahQ3kCNwjP93ghiTLeRuf3j_BOcdMb93rTXHXzcwSnNhu-Hy-6hUizX5gGO9U8xWy0f_YT-AsaLoCc
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+Software+Reliability+Engineering&rft.atitle=Loghub%3A+A+Large+Collection+of+System+Log+Datasets+for+AI-driven+Log+Analytics&rft.au=Zhu%2C+Jieming&rft.au=He%2C+Shilin&rft.au=He%2C+Pinjia&rft.au=Liu%2C+Jinyang&rft.date=2023-10-09&rft.pub=IEEE&rft.eissn=2332-6549&rft.spage=355&rft.epage=366&rft_id=info:doi/10.1109%2FISSRE59848.2023.00071&rft.externalDocID=10301257