SmartOClock: Workload- and Risk-Aware Overclocking in the Cloud
Operating server components beyond their voltage and power design limit (i.e., overclocking) enables improving performance and lowering cost for cloud workloads. However, overclocking can significantly degrade component lifetime, increase power draw, and cause power capping events, eventually dimini...
Uloženo v:
| Vydáno v: | 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) s. 437 - 451 |
|---|---|
| Hlavní autoři: | , , , , , , , , , , , , , , |
| Médium: | Konferenční příspěvek |
| Jazyk: | angličtina |
| Vydáno: |
IEEE
29.06.2024
|
| Témata: | |
| On-line přístup: | Získat plný text |
| Tagy: |
Přidat tag
Žádné tagy, Buďte první, kdo vytvoří štítek k tomuto záznamu!
|
| Shrnutí: | Operating server components beyond their voltage and power design limit (i.e., overclocking) enables improving performance and lowering cost for cloud workloads. However, overclocking can significantly degrade component lifetime, increase power draw, and cause power capping events, eventually diminishing the performance benefits. In this paper, we characterize the impact of overclocking on cloud workloads by studying their profiles from production deployments. Based on the characterization insights, we propose SmartOClock, the first distributed overclocking management platform specifically designed for cloud environments. SmartOClock is a workload-aware scheme that relies on power predictions to heterogeneously distribute the power budgets across its servers based on their needs and then enforce budget compliance locally, per-server, in a decentralized manner. SmartOClock reduces the tail latency by 9%, application cost by 30% and total energy consumption by 10% for latencysensitive microservices on a 36-server deployment. Simulation analysis using production traces show that SmartOClock reduces the number of power capping events by up to 95% while increasing the overclocking success rate by up to 62%. We also describe lessons from building a first-of-its-kind overclockable cluster in Microsoft Azure for production experiments. |
|---|---|
| DOI: | 10.1109/ISCA59077.2024.00040 |