Projects with this topic
-
🔧 🔗 https://github.com/vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Updated -
🔧 🔗 https://github.com/sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Updated -
🔧 🔗 https://github.com/vllm-project/vllm-ascend Community maintained hardware plugin for vLLM on AscendUpdated -
https://github.com/roboflow/inference A fast, easy-to-use, production-ready inference server for computer vision supporting deployment of many popular model architectures and fine-tuned models.
Updated -
🔧 🔗 https://github.com/SYSTRAN/faster-whisperFaster Whisper transcription with CTranslate2
Updated -
🔧 🔗 https://github.com/nbursa/inception-core-serverA modular, extensible Rust-based server providing short-term, long-term, and latent memory services, a chat endpoint with a BaseAgent + Sentience DSL, and integration with ChromaDB and LLM services.
Updated -
🔧 🔗 https://github.com/flashinfer-ai/debug-printDebug print operator for cudagraph debugging
Updated