Adaptive Performance Anomaly Detection for Online Service Systems via Pattern Sketching

To ensure the performance of online service systems, their status is closely monitored with various software and system metrics. Performance anomalies represent the performance degradation issues (e.g., slow response) of the service systems. When performing anomaly detection over the metrics, existi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) S. 61 - 72
Hauptverfasser: Chen, Zhuangbin, Liu, Jinyang, Su, Yuxin, Zhang, Hongyu, Ling, Xiao, Lyu, Michael R.
Format: Tagungsbericht
Sprache:Englisch
Veröffentlicht: ACM 01.05.2022
Schlagworte:
ISSN:1558-1225
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Abstract To ensure the performance of online service systems, their status is closely monitored with various software and system metrics. Performance anomalies represent the performance degradation issues (e.g., slow response) of the service systems. When performing anomaly detection over the metrics, existing methods often lack the merit of interpretability, which is vital for engineers and analysts to take remediation actions. Moreover, they are unable to effectively accommodate the ever-changing services in an online fashion. To address these limitations, in this paper, we propose ADSketch, an interpretable and adaptive performance anomaly detection approach based on pattern sketching. ADSketch achieves interpretability by identifying groups of anomalous metric patterns, which represent particular types of performance issues. The underlying issues can then be immediately recognized if similar patterns emerge again. In addition, an adaptive learning algorithm is designed to embrace unprecedented patterns induced by service updates or user behavior changes. The proposed approach is evaluated with public data as well as industrial data collected from a representative online service system in Huawei Cloud. The experimental results show that ADSketch outperforms state-of-the-art approaches by a significant margin, and demonstrate the effectiveness of the online algorithm in new pattern discovery. Furthermore, our approach has been successfully deployed in industrial practice.
AbstractList To ensure the performance of online service systems, their status is closely monitored with various software and system metrics. Performance anomalies represent the performance degradation issues (e.g., slow response) of the service systems. When performing anomaly detection over the metrics, existing methods often lack the merit of interpretability, which is vital for engineers and analysts to take remediation actions. Moreover, they are unable to effectively accommodate the ever-changing services in an online fashion. To address these limitations, in this paper, we propose ADSketch, an interpretable and adaptive performance anomaly detection approach based on pattern sketching. ADSketch achieves interpretability by identifying groups of anomalous metric patterns, which represent particular types of performance issues. The underlying issues can then be immediately recognized if similar patterns emerge again. In addition, an adaptive learning algorithm is designed to embrace unprecedented patterns induced by service updates or user behavior changes. The proposed approach is evaluated with public data as well as industrial data collected from a representative online service system in Huawei Cloud. The experimental results show that ADSketch outperforms state-of-the-art approaches by a significant margin, and demonstrate the effectiveness of the online algorithm in new pattern discovery. Furthermore, our approach has been successfully deployed in industrial practice.
Author Liu, Jinyang
Su, Yuxin
Ling, Xiao
Chen, Zhuangbin
Lyu, Michael R.
Zhang, Hongyu
Author_xml – sequence: 1
  givenname: Zhuangbin
  surname: Chen
  fullname: Chen, Zhuangbin
  organization: The Chinese University of Hong Kong,Hong Kong,China
– sequence: 2
  givenname: Jinyang
  surname: Liu
  fullname: Liu, Jinyang
  organization: The Chinese University of Hong Kong,Hong Kong,China
– sequence: 3
  givenname: Yuxin
  surname: Su
  fullname: Su, Yuxin
  email: suyx35@mail.sysu.edu.cn
  organization: School of Software Engineering, Sun Yat-sen University,Zhuhai,China
– sequence: 4
  givenname: Hongyu
  surname: Zhang
  fullname: Zhang, Hongyu
  organization: The University of Newcastle,NSW,Australia
– sequence: 5
  givenname: Xiao
  surname: Ling
  fullname: Ling, Xiao
  organization: Yongqiang Yang Huawei Cloud BU,Beijing,China
– sequence: 6
  givenname: Michael R.
  surname: Lyu
  fullname: Lyu, Michael R.
  organization: The Chinese University of Hong Kong,Hong Kong,China
BookMark eNotjM9LwzAYQKMouM2dPXjJP9CZL2na5ljmTxhsUMXj-NJ-0eiajjYM-t9b1NM7vMebs4vQBWLsBsQKINV3SoMQQq1-WegztjR5MQmhjJQA52wGWhcJSKmv2HwYvqY6S42ZsfeywWP0J-I76l3Xtxhq4mXoWjyM_J4i1dF3gU-Kb8PBB-IV9Sc_RdU4RGoHfvLIdxgj9YFX3xTrTx8-rtmlw8NAy38u2Nvjw-v6Odlsn17W5SZBmeuYZOgypdIcm8Y2JrdgJGorXQqFaFSdUQbaFiIVNaBD3QgkLJwFa6wGkzu1YLd_X09E-2PvW-zHvclNKjKtfgBOVlOn
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1145/3510003.3510085
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781450392211
1450392210
EISSN 1558-1225
EndPage 72
ExternalDocumentID 9794065
Genre orig-research
GrantInformation_xml – fundername: Australian Research Council (ARC)
  grantid: DP200102940,DP220103044
  funderid: 10.13039/501100000923
GroupedDBID -~X
.4S
.DC
123
23M
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
AFFNX
ALMA_UNASSIGNED_HOLDINGS
APO
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
FEDTE
I-F
I07
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
XOL
ID FETCH-LOGICAL-a275t-6af63347addbd97b192a5b2f4180d3c6e615b8040c1afa5d0aea8fb1b9b5197f3
IEDL.DBID RIE
ISICitedReferencesCount 31
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=000832185400006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:28:32 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a275t-6af63347addbd97b192a5b2f4180d3c6e615b8040c1afa5d0aea8fb1b9b5197f3
PageCount 12
ParticipantIDs ieee_primary_9794065
PublicationCentury 2000
PublicationDate 2022-May
PublicationDateYYYYMMDD 2022-05-01
PublicationDate_xml – month: 05
  year: 2022
  text: 2022-May
PublicationDecade 2020
PublicationTitle 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE)
PublicationTitleAbbrev ICSE
PublicationYear 2022
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0006499
ssj0002871777
Score 2.4329133
Snippet To ensure the performance of online service systems, their status is closely monitored with various software and system metrics. Performance anomalies...
SourceID ieee
SourceType Publisher
StartPage 61
SubjectTerms Adaptation models
Adaptive learning
Cloud computing
Measurement
online learning
performance anomaly detection
Production
Software
Software algorithms
Time series analysis
Title Adaptive Performance Anomaly Detection for Online Service Systems via Pattern Sketching
URI https://ieeexplore.ieee.org/document/9794065
WOSCitedRecordID wos000832185400006&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED21FQNTgRbxLQ-MpG1iO3ZGBFQMqIoEiG6VHZ-lCkirNq3Ev8d2Q8vAwpQoGRKdcx--3HsP4JoWIkEpbZQKwyNGMYs0MzIy1qoUGWqOQbXkSYxGcjzO8gbcbLEwiBiGz7DnT8O_fDMrVr5V1s_cx-NSZhOaQqQbrNa2n-Ir_0BtV0fh1JXyNZVPzHifct_Ipr1w9MLJv7RUQioZtv_3EgfQ3WHySL7NNofQwPII2j-iDKT20Q683Ro19zGM5DtMAHG7_E_18UXusQqzVyVxt8iGZ5TU8YLU7OVkPVUkD7ybJXl-9-vqntiF1-HDy91jVKsnRCoRvIpSZVNKmXABTJtMaFfKKa4Ty2I5MLRI0dUyWjofLmJlFTcDhUpaHetMezCrpcfQKmclngApXBkZa8HReT9DZ2q3rRRK6IFUCQpWnELH22ky3xBkTGoTnf19-Rz2E48hCFODF9CqFiu8hL1iXU2Xi6uwqt_CRaO-
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3NT8IwFG8QTfSECsZve_DogK3t2h2NH8GIhESM3Ei7viZEHQQHif-9bangwYunLdthy-veR9_e7_dD6JLkPAEhTJRyzSJKIIsU1SLSxsgUKCgGXrWky3s9MRxm_Qq6WmFhAMAPn0HTnfp_-XqSz12rrJXZj8emzA206ZSzAlpr1VFxtb8ntwtxOLXFfCDziSlrEeZa2aTpj046-Zeaik8m97X_vcYuaqxRebi_yjd7qALFPqr9yDLg4KV19Hqt5dRFMdxfowKw3ed_yPcvfAuln74qsL2Fl0yjOEQMHPjL8WIscd8zbxb4-c2trH1iA73c3w1uOlHQT4hkwlkZpdKkhFBuQ5jSGVe2mJNMJYbGoq1JnoKtZpSwXpzH0kim2xKkMCpWmXJwVkMOULWYFHCIcG4LyVhxBtb_KVhT240ll1y1hUyA0_wI1Z2dRtMlRcYomOj478sXaLszeOqOug-9xxO0kzhEgZ8hPEXVcjaHM7SVL8rx5-zcr_A3efenBw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2022+IEEE%2FACM+44th+International+Conference+on+Software+Engineering+%28ICSE%29&rft.atitle=Adaptive+Performance+Anomaly+Detection+for+Online+Service+Systems+via+Pattern+Sketching&rft.au=Chen%2C+Zhuangbin&rft.au=Liu%2C+Jinyang&rft.au=Su%2C+Yuxin&rft.au=Zhang%2C+Hongyu&rft.date=2022-05-01&rft.pub=ACM&rft.eissn=1558-1225&rft.spage=61&rft.epage=72&rft_id=info:doi/10.1145%2F3510003.3510085&rft.externalDocID=9794065