diff --git a/recipes/benchmarks/fmbench/README.md b/recipes/benchmarks/fmbench/README.md index fcb437b91520b7bf79e8fe2e1322987f7bf516b1..8c21bc33e1e5ca2c1fc5bbd0e140669909b346c6 100644 --- a/recipes/benchmarks/fmbench/README.md +++ b/recipes/benchmarks/fmbench/README.md @@ -14,7 +14,7 @@ Customers often wonder what is the best AWS service to run Llama models for _my The following figure gives an example of the price performance numbers that include inference latency, transactions per-minute and concurrency level for running the `Llama2-13b` model on different instance types available on SageMaker using prompts for Q&A task created from the [`LongBench`](https://huggingface.co/datasets/THUDM/LongBench) dataset, these prompts are between 3000 to 3840 tokens in length. **_Note that the numbers are hidden in this figure but you would be able to see them when you run `FMBench` yourself_**. - + The following table (also included in the report) provides information about the best available instance type for that experiment<sup>1</sup>. diff --git a/recipes/benchmarks/fmbench/img/business_summary.png b/recipes/benchmarks/fmbench/img/business_summary.png new file mode 100644 index 0000000000000000000000000000000000000000..a04b1ced6f73f6eff7af3e03d868891025576780 Binary files /dev/null and b/recipes/benchmarks/fmbench/img/business_summary.png differ