Skip to content
Snippets Groups Projects
Unverified Commit f5645ace authored by Hamid Shojanazeri's avatar Hamid Shojanazeri Committed by GitHub
Browse files

typo fix

parent 34e43a55
No related branches found
No related tags found
No related merge requests found
......@@ -26,7 +26,7 @@ python run_summarization.py \
##### **Results**
Expected results on XSUM (Rouge-2 score, ther higher the better) from the above scripts on Llama-2/3 models. The sequence length of inputs are ~2k. Here we constrains the size of KV cache, allowing only n KVs to be write/read after the prefilling stage. n ranges from **64** to **full** where we maintain all the KV pairs. With 128 KVs, the performance can be matched as the full baseline (~2k KVs) while performance degradation is observed with 64 KVs. Also, maintaining a smaller KV cache reduces the I/O cost of KVs, thus we can achieve better throughput.
Expected results on XSUM (Rouge-2 score, the higher the better) from the above scripts on Llama-2/3 models. The sequence length of inputs are ~2k. Here we constrains the size of KV cache, allowing only n KVs to be write/read after the prefilling stage. n ranges from **64** to **full** where we maintain all the KV pairs. With 128 KVs, the performance can be matched as the full baseline (~2k KVs) while performance degradation is observed with 64 KVs. Also, maintaining a smaller KV cache reduces the I/O cost of KVs, thus we can achieve better throughput.
| KV Cache Size | 64 | 128 | 256 | 512 | 1024 | Full |
| ------------- | ------ | ------ | ------ | ------ | ------ | ------ |
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment