Zhao, Y., Wu, D., & Wang, J. (2024, June 29). ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching. 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), 1005-1017. https://doi.org/10.1109/ISCA59077.2024.00077
Chicago-Zitierstil (17. Ausg.)Zhao, Youpeng, Di Wu, und Jun Wang. "ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching." 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA) 29 Jun. 2024: 1005-1017. https://doi.org/10.1109/ISCA59077.2024.00077.
MLA-Zitierstil (9. Ausg.)Zhao, Youpeng, et al. "ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching." 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA), 29 Jun. 2024, pp. 1005-1017, https://doi.org/10.1109/ISCA59077.2024.00077.