Projects with this topic
-
🔧 🔗 https://github.com/vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Updated -
🔧 🔗 https://github.com/flashinfer-ai/flashinferFlashInfer: Kernel Library for LLM Serving
Updated -
🔧 🔗 https://github.com/google/gemma.cpplightweight, standalone C++ inference engine for Google's Gemma models.
Updated -
https://github.com/InternLM/lmdeploy LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
🔗 lmdeploy.readthedocs.io/en/latest/Updated -
🔧 🔗 https://github.com/modelscope/dash-inferDashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
Updated -
🔧 🔗 https://github.com/bytedance/ShadowKV ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM InferenceUpdated -
https://github.com/coqui-ai/inference-engine Coqui Inference Engine
Updated