Tropical: Enhancing SLO Attainment in Disaggregated LLM Serving via SLO-Aware Multiplexing
To guarantee service quality in transformer based large language model (LLM) serving, it is essential to meet the latency constraints of both the prefill phase (measured by Time-to-First-Token, TTFT) and the decode phase (measured by Time-per-Output-Token, TPOT). Non-disaggregated serving places pre...
Saved in:
| Published in: | 2025 62nd ACM/IEEE Design Automation Conference (DAC) pp. 1 - 7 |
|---|---|
| Main Authors: | , , , , , , , |
| Format: | Conference Proceeding |
| Language: | English |
| Published: |
IEEE
22.06.2025
|
| Subjects: | |
| Online Access: | Get full text |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | To guarantee service quality in transformer based large language model (LLM) serving, it is essential to meet the latency constraints of both the prefill phase (measured by Time-to-First-Token, TTFT) and the decode phase (measured by Time-per-Output-Token, TPOT). Non-disaggregated serving places prefill and decode on the same worker, while disaggregated serving places the prefill and decode on isolated workers. However, no single architecture excels in both TTFT and TPOT metrics. After conducting a root cause analysis, we concluded that in disaggregated LLM serving, prefill execution has minimal interference with decode execution but result in high queuing times. In contrast, non-disaggregated LLM serving effectively reduces queuing times but introduces significant interference between prefills and decodes. In order to leverage the best aspects of both non-disaggregated and disaggregated LLM serving, we have designed and implemented Tropical. Tropical introduces an sevice-level objectives (SLO)-aware multiplexing strategy that balances the queuing time and the interference, enabling the LLM serving to achieve high TTFT and TPOT SLOs simultaneously. Our evaluation of real-world datasets reveals that Tropical outperforms both state-of-the-art non-disaggregated and disaggregated LLM serving systems, achieving up to 2.09 \times more requests within a 90% SLO attainment. Specially, compared to the disaggregated LLM serving system, Tropical improves P90 TTFT performance by 9 \times with only an 15% reduction in P90 TPOT. Against the non-disaggregated LLM serving systems, Tropical delivers a 2.8 \times performance improvement in P90 TPOT while maintaining the same P90 TTFT. |
|---|---|
| DOI: | 10.1109/DAC63849.2025.11132617 |