Zhao, Y., Wu, D., & Wang, J. (2024, June 29). ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching. 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), 1005-1017. https://doi.org/10.1109/ISCA59077.2024.00077
Chicago Style (17th ed.) CitationZhao, Youpeng, Di Wu, and Jun Wang. "ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching." 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) 29 Jun. 2024: 1005-1017. https://doi.org/10.1109/ISCA59077.2024.00077.
MLA (9th ed.) CitationZhao, Youpeng, et al. "ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching." 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), 29 Jun. 2024, pp. 1005-1017, https://doi.org/10.1109/ISCA59077.2024.00077.