Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics
Logs have been widely adopted in software system development and maintenance because of the rich runtime information they record. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effec...
Uloženo v:
| Vydáno v: | Proceedings - International Symposium on Software Reliability Engineering s. 355 - 366 |
|---|---|
| Hlavní autoři: | , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
09.10.2023
|
| Témata: | |
| ISSN: | 2332-6549 |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Abstract | Logs have been widely adopted in software system development and maintenance because of the rich runtime information they record. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effectively, a line of research focuses on developing intelligent and automated log analysis techniques. However, only a few of these techniques have reached successful deployments in industry due to the lack of public log datasets and open benchmarking upon them. To fill this significant gap and facilitate more research on AI-driven log analytics, we have collected and released loghub, a large collection of system log datasets. In particular, loghub provides 19 real-world log datasets collected from a wide range of software systems, including distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. In this paper, we summarize the statistics of these datasets, introduce some practical usage scenarios of the loghub datasets, and present our benchmarking results on loghub to benefit the researchers and practitioners in this field. Up to the time of this paper writing, the loghub datasets have been downloaded for roughly 90,000 times in total by hundreds of organizations from both industry and academia. The loghub datasets are available at https://github.com/logpai/loghub. |
|---|---|
| AbstractList | Logs have been widely adopted in software system development and maintenance because of the rich runtime information they record. In recent years, the increase of software size and complexity leads to the rapid growth of the volume of logs. To handle these large volumes of logs efficiently and effectively, a line of research focuses on developing intelligent and automated log analysis techniques. However, only a few of these techniques have reached successful deployments in industry due to the lack of public log datasets and open benchmarking upon them. To fill this significant gap and facilitate more research on AI-driven log analytics, we have collected and released loghub, a large collection of system log datasets. In particular, loghub provides 19 real-world log datasets collected from a wide range of software systems, including distributed systems, supercomputers, operating systems, mobile systems, server applications, and standalone software. In this paper, we summarize the statistics of these datasets, introduce some practical usage scenarios of the loghub datasets, and present our benchmarking results on loghub to benefit the researchers and practitioners in this field. Up to the time of this paper writing, the loghub datasets have been downloaded for roughly 90,000 times in total by hundreds of organizations from both industry and academia. The loghub datasets are available at https://github.com/logpai/loghub. |
| Author | Liu, Jinyang He, Shilin Zhu, Jieming Lyu, Michael R. He, Pinjia |
| Author_xml | – sequence: 1 givenname: Jieming surname: Zhu fullname: Zhu, Jieming email: jiemingzhu@ieee.org organization: The Chinese University of Hong Kong, (CUHK Shenzhen),School of Data Science,Shenzhen,China – sequence: 2 givenname: Shilin surname: He fullname: He, Shilin email: slhe@link.cuhk.edu.hk organization: The Chinese University of Hong Kong, (CUHK Shenzhen),School of Data Science,Shenzhen,China – sequence: 3 givenname: Pinjia surname: He fullname: He, Pinjia email: hepinjia@cuhk.edu.cn organization: The Chinese University of Hong Kong, (CUHK Shenzhen),School of Data Science,Shenzhen,China – sequence: 4 givenname: Jinyang surname: Liu fullname: Liu, Jinyang email: jyliu@cse.cuhk.edu.hk organization: The Chinese University of Hong Kong,Department of Computer Science and Engineering,China – sequence: 5 givenname: Michael R. surname: Lyu fullname: Lyu, Michael R. email: lyu@cse.cuhk.edu.hk organization: The Chinese University of Hong Kong,Department of Computer Science and Engineering,China |
| BookMark | eNotjlFLwzAUhaMouM39A4X8gc57b5I28a3MqYWK4PR53HbJrHStNFXYv3eoT-fhfJzvTMVZ13deiGuEBSK4m2K9flkZZ7VdEJBaAECGJ2LuMmeVAYXGaXUqJqQUJanR7kJMY_wAINBIE_FU9rv3r-pW5rLkYeflsm9bX49N38k-yPUhjn4vj5C845GjH6MM_SDzItkOzbfvfqu84_YwNnW8FOeB2-jn_zkTb_er1-VjUj4_FMu8TJqjdkycrwnSis02KG2JUTGnFdaINeuAJqWAyh4vsiKotK_IVlkwxKnFrclAzcTV327jvd98Ds2eh8MGQQGSydQPk-BPKA |
| CODEN | IEEPAD |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1109/ISSRE59848.2023.00071 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 9798350315943 |
| EISSN | 2332-6549 |
| EndPage | 366 |
| ExternalDocumentID | 10301257 |
| Genre | orig-research |
| GroupedDBID | 6IE 6IF 6IH 6IK 6IL 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL RNS |
| ID | FETCH-LOGICAL-i204t-9ec206ba5df3482a13aa6b1c11ca4f1562f138020a320b4eb28b7f52a681d5703 |
| IEDL.DBID | RIE |
| ISICitedReferencesCount | 72 |
| ISICitedReferencesURI | http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001096886300032&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| IngestDate | Wed Aug 27 02:30:35 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-i204t-9ec206ba5df3482a13aa6b1c11ca4f1562f138020a320b4eb28b7f52a681d5703 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_10301257 |
| PublicationCentury | 2000 |
| PublicationDate | 2023-Oct.-9 |
| PublicationDateYYYYMMDD | 2023-10-09 |
| PublicationDate_xml | – month: 10 year: 2023 text: 2023-Oct.-9 day: 09 |
| PublicationDecade | 2020 |
| PublicationTitle | Proceedings - International Symposium on Software Reliability Engineering |
| PublicationTitleAbbrev | ISSRE |
| PublicationYear | 2023 |
| Publisher | IEEE |
| Publisher_xml | – name: IEEE |
| SSID | ssj0020412 |
| Score | 2.5742836 |
| Snippet | Logs have been widely adopted in software system development and maintenance because of the rich runtime information they record. In recent years, the increase... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 355 |
| SubjectTerms | anomaly detection Benchmark testing benchmarks Industries log analytics Log datasets log intelligence Operating systems Organizations Runtime Software systems Writing |
| Title | Loghub: A Large Collection of System Log Datasets for AI-driven Log Analytics |
| URI | https://ieeexplore.ieee.org/document/10301257 |
| WOSCitedRecordID | wos001096886300032&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09b8IwED0V1KET_aDqtzx0TYmdBCfdUAsqUotQaSU2dHZsiQUqCP399R0pTB26RYmlKOc478659x7AvbTaOSLH6FKmUWp0FhXhPYpISq2MSy_R5mw2oUejfDotxjVZnbkwzjluPnMPdMj_8sul3dBWWYcssQIg6wY0tNZbstauuiLhqJqiI-OiM5xM3vtZkXP_liIZ05hp8nsLFUaQQeuf9z6G9p6LJ8Y7lDmBA7c4hdavGYOo1-YZvLH5rnkUPfFK3d2CtwSYtSCWXmyVyUUYJJ6xCtBVrUXIV0VvGJUr-uLxJZYoIeHmNnwO-h9PL1HtlRDNw5NXUeGsirsGs9KTXA3KBLFrpJXSYupDkaa8TPIQJUxUbNJQT-dG-0xhNySspMJ1Ds3FcuEuQBgbMMpI4zyGdMMnJnE-ReUx8x4NqktoU3hmX1s5jNlvZK7-OH8NRzQD3AFX3ECzWm3cLRza72q-Xt3xJP4AbWic4A |
| linkProvider | IEEE |
| linkToHtml | http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09b8IwED21tFI70Q-qftdDV5fYSUjSDbUgUAGhQiU2dHZsiQUqCP399ZkAU4duUWIpyjnOu3PuvQfwLHRiDJFjklxEPFJJzDP3HnGSUsuD3ArUqTebSAaDdDLJhiVZ3XNhjDG--cy80KH_l58v9Jq2yupkieUAOTmEoziKpNjQtXb1FUlHlSQdEWT17mj02Yqz1HdwSRIyDTxRfm-i4jGkXf3n3c-gtmfjseEOZ87hwMwvoLq1Y2Dl6ryEvrffVa-syXrU3838poDnLbCFZRttcuYGsXcsHHgVK-YyVtbs8nxJ3zx_yYuUkHRzDb7arfFbh5duCXzmnrzgmdEyaCiMc0uCNShCxIYSWgiNkXVlmrQiTF2UMJSBilxFnarExhIbLmUlHa4rqMwXc3MNTGmHUkooY9ElHDZUobERSouxtahQ3kCNwjP93ghiTLeRuf3j_BOcdMb93rTXHXzcwSnNhu-Hy-6hUizX5gGO9U8xWy0f_YT-AsaLoCc |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+-+International+Symposium+on+Software+Reliability+Engineering&rft.atitle=Loghub%3A+A+Large+Collection+of+System+Log+Datasets+for+AI-driven+Log+Analytics&rft.au=Zhu%2C+Jieming&rft.au=He%2C+Shilin&rft.au=He%2C+Pinjia&rft.au=Liu%2C+Jinyang&rft.date=2023-10-09&rft.pub=IEEE&rft.eissn=2332-6549&rft.spage=355&rft.epage=366&rft_id=info:doi/10.1109%2FISSRE59848.2023.00071&rft.externalDocID=10301257 |