Titelangaben
Agrawal, Nikita ; Mayer, Ruben:
Benchmarking KV-Cache Optimizations across Task Quality and
System Performance for Long-Context Serving [Experiment,
Analysis & Benchmark].
Bayreuth
,
2026
. - 13 S.
DOI: https://doi.org/10.15495/EPub_UBT_00009365
Abstract
Large language model serving is increasingly limited by KV-cache growth under long-context workloads, yet existing KV-cache compression techniques are difficult to compare because they were evaluated on different models, tasks, budgets, and serving stacks. This paper presents a workload-aware benchmark of representative KV-cache optimization mechanisms spanning quantization, pruning, and merging, including KIVI, TurboQuant, SnapKV, and CaM, evaluated on LongBench-style multi-document QA, single-document QA,
few-shot learning, and summarization workloads using Llama-3.1-8B-Instruct and Mistral-7B-Instruct-v0.3. The benchmark measures task quality, mean output throughput, mean time-to-first-token, and realized compression ratio across context-length buckets. The results show that the compression ratio alone is a poor predictor of end-to-end performance. KIVI4 provides the most stable quality across models, SnapKV delivers the strongest long-context throughput, and CaM yields large gains on selected QA workloads but exhibits substantial workload sensitivity in both quality and realized compression ratio. These findings motivate workload-aware selection of KV-cache mechanisms rather than one-size-fits-all compression and provide deployment guidance for long-context serving systems.
Weitere Angaben
| Publikationsform: | Preprint, Postprint |
|---|---|
| Zusätzliche Informationen: | Eingereicht bei: Proceedings of the VLDB Endowment ISSN 2150-8097 |
| Keywords: | KV-caching; Memory-bound computing; LLMs; Long-context inference |
| Institutionen der Universität: | Fakultäten > Fakultät für Mathematik, Physik und Informatik > Institut für Informatik > Lehrstuhl Angewandte Informatik X > Lehrstuhl Angewandte Informatik X - Univ.-Prof. Dr. Ruben Mayer Fakultäten Fakultäten > Fakultät für Mathematik, Physik und Informatik Fakultäten > Fakultät für Mathematik, Physik und Informatik > Institut für Informatik > Lehrstuhl Angewandte Informatik X Fakultäten > Fakultät für Mathematik, Physik und Informatik > Institut für Informatik |
| Titel an der UBT entstanden: | Ja |
| Themengebiete aus DDC: | 000 Informatik,Informationswissenschaft, allgemeine Werke > 004 Informatik |
| Eingestellt am: | 13 Jun 2026 21:00 |
| Letzte Änderung: | 13 Jun 2026 21:00 |
| URI: | https://eref.uni-bayreuth.de/id/eprint/98838 |

bei Google Scholar