typo fix

f5645ace · Hamid Shojanazeri · GitHub · 34e43a55 · f5645ace
Unverified Commit f5645ace authored 9 months ago by Hamid Shojanazeri Committed by GitHub 9 months ago
--- a/recipes/experimental/long-context/H2O/README.md
+++ b/recipes/experimental/long-context/H2O/README.md
@@ -26,7 +26,7 @@ python run_summarization.py \

 ##### **Results**

-Expected results on XSUM (Rouge-2 score, ther higher the better) from the above scripts on Llama-2/3 models. The sequence length of inputs are ~2k. Here we constrains the size of KV cache, allowing only n KVs to be write/read after the prefilling stage. n ranges from **64** to **full** where we maintain all the KV pairs. With 128 KVs, the performance can be matched as the full baseline (~2k KVs) while performance degradation is observed with 64 KVs. Also, maintaining a smaller KV cache reduces the I/O cost of KVs, thus we can achieve better throughput.
+Expected results on XSUM (Rouge-2 score, the higher the better) from the above scripts on Llama-2/3 models. The sequence length of inputs are ~2k. Here we constrains the size of KV cache, allowing only n KVs to be write/read after the prefilling stage. n ranges from **64** to **full** where we maintain all the KV pairs. With 128 KVs, the performance can be matched as the full baseline (~2k KVs) while performance degradation is observed with 64 KVs. Also, maintaining a smaller KV cache reduces the I/O cost of KVs, thus we can achieve better throughput.

 | KV Cache Size | 64     | 128    | 256    | 512    | 1024   | Full   |
 | ------------- | ------ | ------ | ------ | ------ | ------ | ------ |