Projects with this topic
-
https://github.com/InternLM/lmdeploy LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
🔗 lmdeploy.readthedocs.io/en/latest/Updated -
🔧 🔗 https://github.com/vllm-project/vllm-ascend Community maintained hardware plugin for vLLM on AscendUpdated -
🔧 🔗 https://github.com/modelscope/dash-inferDashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
Updated -
https://github.com/janhq/cortex.tensorrt-llm Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
Updated -
🔧 🔗 https://github.com/QwenLM/qwen.cpp C++ implementation of Qwen-LMUpdated -
https://github.com/LibreTranslate/RemoveDup Remove duplicates from parallel corpora
Updated