Prism: Revealing Hidden Functional Clusters from Massive Instances in Cloud Systems

Ensuring the reliability of cloud systems is critical for both cloud vendors and customers. Cloud systems often rely on virtualization techniques to create instances of hardware resources, such as virtual machines. However, virtualization hinders the observability of cloud systems, making it challen...

Celý popis

Uloženo v:
Podrobná bibliografie
Vydáno v:IEEE/ACM International Conference on Automated Software Engineering : [proceedings] s. 268 - 280
Hlavní autoři: Liu, Jinyang, Jiang, Zhihan, Gu, Jiazhen, Huang, Junjie, Chen, Zhuangbin, Feng, Cong, Yang, Zengyin, Yang, Yongqiang, Lyu, Michael R.
Médium: Konferenční příspěvek
Jazyk:angličtina
Vydáno: IEEE 11.09.2023
Témata:
ISSN:2643-1572
On-line přístup:Získat plný text
Tagy: Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
Abstract Ensuring the reliability of cloud systems is critical for both cloud vendors and customers. Cloud systems often rely on virtualization techniques to create instances of hardware resources, such as virtual machines. However, virtualization hinders the observability of cloud systems, making it challenging to diagnose platform-level issues. To improve system observability, we propose to infer functional clusters of instances, i.e., groups of instances having similar functionalities. We first conduct a pilot study on a large-scale cloud system, i.e., Huawei Cloud, demonstrating that instances having similar functionalities share similar communication and resource usage patterns. Motivated by these findings, we formulate the identification of functional clusters as a clustering problem and propose a non-intrusive solution called Prism. Prism adopts a coarse-to-fine clustering strategy. It first partitions instances into coarse-grained chunks based on communication patterns. Within each chunk, Prism further groups instances with similar resource usage patterns to produce fine-grained functional clusters. Such a design reduces noises in the data and allows Prism to process massive instances efficiently. We evaluate Prism on two datasets collected from the real-world production environment of Huawei Cloud. Our experiments show that Prism achieves a v-measure of ∼0.95, surpassing existing state-of-the-art solutions. Additionally, we illustrate the integration of Prism within monitoring systems for enhanced cloud reliability through two real-world use cases.
AbstractList Ensuring the reliability of cloud systems is critical for both cloud vendors and customers. Cloud systems often rely on virtualization techniques to create instances of hardware resources, such as virtual machines. However, virtualization hinders the observability of cloud systems, making it challenging to diagnose platform-level issues. To improve system observability, we propose to infer functional clusters of instances, i.e., groups of instances having similar functionalities. We first conduct a pilot study on a large-scale cloud system, i.e., Huawei Cloud, demonstrating that instances having similar functionalities share similar communication and resource usage patterns. Motivated by these findings, we formulate the identification of functional clusters as a clustering problem and propose a non-intrusive solution called Prism. Prism adopts a coarse-to-fine clustering strategy. It first partitions instances into coarse-grained chunks based on communication patterns. Within each chunk, Prism further groups instances with similar resource usage patterns to produce fine-grained functional clusters. Such a design reduces noises in the data and allows Prism to process massive instances efficiently. We evaluate Prism on two datasets collected from the real-world production environment of Huawei Cloud. Our experiments show that Prism achieves a v-measure of ∼0.95, surpassing existing state-of-the-art solutions. Additionally, we illustrate the integration of Prism within monitoring systems for enhanced cloud reliability through two real-world use cases.
Author Liu, Jinyang
Feng, Cong
Chen, Zhuangbin
Yang, Yongqiang
Lyu, Michael R.
Gu, Jiazhen
Yang, Zengyin
Huang, Junjie
Jiang, Zhihan
Author_xml – sequence: 1
  givenname: Jinyang
  surname: Liu
  fullname: Liu, Jinyang
  email: jyliu@cse.cuhk.edu.hk
  organization: The Chinese University of Hong Kong,Hong Kong SAR,China
– sequence: 2
  givenname: Zhihan
  surname: Jiang
  fullname: Jiang, Zhihan
  email: zhjiang22@cse.cuhk.edu.hk
  organization: The Chinese University of Hong Kong,Hong Kong SAR,China
– sequence: 3
  givenname: Jiazhen
  surname: Gu
  fullname: Gu, Jiazhen
  email: jzgu@cse.cuhk.edu.hk
  organization: The Chinese University of Hong Kong,Hong Kong SAR,China
– sequence: 4
  givenname: Junjie
  surname: Huang
  fullname: Huang, Junjie
  email: jjhuang23@cse.cuhk.edu.hk
  organization: The Chinese University of Hong Kong,Hong Kong SAR,China
– sequence: 5
  givenname: Zhuangbin
  surname: Chen
  fullname: Chen, Zhuangbin
  email: chenzhb36@mail.sysu.edu.cn
  organization: School of Software Engineering, Sun Yat-sen University,Zhuhai,China
– sequence: 6
  givenname: Cong
  surname: Feng
  fullname: Feng, Cong
  email: fengcong5@huawei.com
  organization: Huawei Cloud Computing Technology Co., Ltd,Computing and Networking Innovation Lab,China
– sequence: 7
  givenname: Zengyin
  surname: Yang
  fullname: Yang, Zengyin
  email: yangzengyin@huawei.com
  organization: Huawei Cloud Computing Technology Co., Ltd,Computing and Networking Innovation Lab,China
– sequence: 8
  givenname: Yongqiang
  surname: Yang
  fullname: Yang, Yongqiang
  email: yangyongqiang@huawei.com
  organization: Huawei Cloud Computing Technology Co., Ltd,Computing and Networking Innovation Lab,China
– sequence: 9
  givenname: Michael R.
  surname: Lyu
  fullname: Lyu, Michael R.
  email: lyu@cse.cuhk.edu.hk
  organization: The Chinese University of Hong Kong,Hong Kong SAR,China
BookMark eNotjN1KAkEYQKcoSM0nqIt5gbX52fnrTkRTMIqsa_l29puY2J2NnVXw7RXq6nDgcMbkJnUJCXngbMY5c0_z3VJpIdxMMCFnjDFjrsjUGWelYlI4p8trMhK6lAVXRtyRcc4_jKmLmBHZvfcxt8_0A48ITUzfdB3rGhNdHZIfYpegoYvmkAfsMw1919JXyDkekW5SHiB5zDSmS9Idaro7Xbo235PbAE3G6T8n5Gu1_Fysi-3by2Yx3xYgbDkUWClfMyOV80ahdlhZoQENBMDgKu9BcwXSlaFiNoA1JmjUQnkFjiP3ckIe_74REfe_fWyhP-05E84qK-UZcURTzw
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ASE56229.2023.00077
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9798350329964
EISSN 2643-1572
EndPage 280
ExternalDocumentID 10298583
Genre orig-research
GrantInformation_xml – fundername: Shenzhen Science and Technology Innovation Commission
  grantid: JCYJ20200109113403826
  funderid: 10.13039/501100010877
GroupedDBID 6IE
6IF
6IH
6IK
6IL
6IM
6IN
6J9
AAJGR
AAWTH
ABLEC
ACREN
ADYOE
ADZIZ
AFYQB
ALMA_UNASSIGNED_HOLDINGS
AMTXH
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-a284t-eb5cd07359c75e69eb826ae7afaef9bcca615a394fb08fa877f6e625c5a91e1c3
IEDL.DBID RIE
ISICitedReferencesCount 6
ISICitedReferencesURI http://www.webofscience.com/api/gateway?GWVersion=2&SrcApp=Summon&SrcAuth=ProQuest&DestLinkType=CitingArticles&DestApp=WOS_CPL&KeyUT=001103357200022&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
IngestDate Wed Aug 27 02:32:41 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a284t-eb5cd07359c75e69eb826ae7afaef9bcca615a394fb08fa877f6e625c5a91e1c3
PageCount 13
ParticipantIDs ieee_primary_10298583
PublicationCentury 2000
PublicationDate 2023-Sept.-11
PublicationDateYYYYMMDD 2023-09-11
PublicationDate_xml – month: 09
  year: 2023
  text: 2023-Sept.-11
  day: 11
PublicationDecade 2020
PublicationTitle IEEE/ACM International Conference on Automated Software Engineering : [proceedings]
PublicationTitleAbbrev ASE
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0051577
ssib057256115
Score 2.2966413
Snippet Ensuring the reliability of cloud systems is critical for both cloud vendors and customers. Cloud systems often rely on virtualization techniques to create...
SourceID ieee
SourceType Publisher
StartPage 268
SubjectTerms Cloud computing
cloud observability
cloud systems
functional clusters
Hardware
instances
Observability
Production
Reliability engineering
Software reliability
Virtual machining
Title Prism: Revealing Hidden Functional Clusters from Massive Instances in Cloud Systems
URI https://ieeexplore.ieee.org/document/10298583
WOSCitedRecordID wos001103357200022&url=https%3A%2F%2Fcvtisr.summon.serialssolutions.com%2F%23%21%2Fsearch%3Fho%3Df%26include.ft.matches%3Dt%26l%3Dnull%26q%3D
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwELVoxcBUPor4lgfWQJwPn82GqlZloKooSN0qxz5LlVCK2qa_n3OaFjEwsEWWh9iX83t2_N4xdk8IJ2NvVeRjaaOMIDQywmcRmELGVmuHuqiLTcBopKZTPW7E6rUWBhHry2f4EB7rf_luYatwVEYZnmiVq7TFWgByK9bafTw5EHgLsee-hNMAjc2QiPXj86RPUJ8EbUoSTE1j-F1QpcaTQeefb3LMuj_KPD7eY84JO8DylHV2pRl4k6lnbDIO3oBP_A03xASpKx8Gr5CSDwjHtsd_vPdZBZeEFQ8SE_5KLJpWPv5S80VaPfi8pC6LyvHG1bzLPgb9994wauonRIZAZx1hkVtHKZxrCzlKjQXtJQyC8Qa9Lih2RGdMqjNfxMobBeAl0n7I5kYLFDY9Z-1yUeIF42AkjTdLHCiTuQSKjJiadypLpfEid5esGyZp9rW1yJjt5ufqj_ZrdhTiEC5eCHHD2utlhbfs0G7W89Xyrg7sN5J_pCg
linkProvider IEEE
linkToHtml http://cvtisr.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1NTwIxEG0UTfSEHxi_7cFrdbtfbb0ZAoEIhAgm3Ei3nSYkZjHA8vudLgvGgwdvm6aHbWen77Xb94aQR0S4NHBGMhekhsUIoUxzFzOhszQwSllQWVlsQgwGcjJRw0qsXmphAKC8fAZP_rH8l2_npvBHZZjhoZKJjPbJgS-dVcm1tp9PIhC-Od-xX0RqISqjIR6o59dRC8E-9OqU0NuaBuJ3SZUSUdr1f77LCWn8aPPocIc6p2QP8jNS3xZnoFWunpPR0LsDvtB3WCMXxK60491CctpGJNscANLmZ-F9EpbUi0xoH3k0rn20WzJGXD_oLMcu88LSyte8QT7arXGzw6oKCkwj7KwYZImxmMSJMiKBVEGGuwkNQjsNTmUYPSQ0OlKxywLptBTCpYA7IpNoxYGb6ILU8nkOl4QKneJ449AKqWMbiixGruasjKNUO57YK9LwkzT92phkTLfzc_1H-wM56oz7vWmvO3i7Icc-Jv4aBue3pLZaFHBHDs16NVsu7ssgfwNPGKdx
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%3A+%5Bproceedings%5D&rft.atitle=Prism%3A+Revealing+Hidden+Functional+Clusters+from+Massive+Instances+in+Cloud+Systems&rft.au=Liu%2C+Jinyang&rft.au=Jiang%2C+Zhihan&rft.au=Gu%2C+Jiazhen&rft.au=Huang%2C+Junjie&rft.date=2023-09-11&rft.pub=IEEE&rft.eissn=2643-1572&rft.spage=268&rft.epage=280&rft_id=info:doi/10.1109%2FASE56229.2023.00077&rft.externalDocID=10298583