Projects with this topic
-
🔧 🔗 https://github.com/vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Updated -
🔧 🔗 https://github.com/vllm-project/llm-compressor Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLMUpdated -
🔧 🔗 https://github.com/vllm-project/vllm-ascend Community maintained hardware plugin for vLLM on AscendUpdated -
🔧 🔗 https://github.com/containers/ramalamaRamaLama is an open-source developer tool that simplifies the local serving of AI models from any source and facilitates their use for inference in production, all
Updated -
🔧 🔗 https://github.com/modelscope/ms-swiftSWIFT (Scalable lightWeight Infrastructure for Fine-Tuning) Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs.
Updated -
🔧 🔗 https://github.com/vllm-project/aibrix Cost-efficient and pluggable Infrastructure components for GenAI inferenceUpdated -
🔧 🔗 https://github.com/vllm-project/vllm-spyreCommunity maintained hardware plugin for vLLM on Spyre
Updated -
🔧 🔗 https://github.com/vllm-project/production-stackScale from single vLLM instance to distributed vLLM deployment without changing any application code.
Updated -
-
Neuromancer
🔧 🔗 https://git.tomfos.tr/tomSelf-hosted, GPU-optimised GenAI platform providing a drop-in OpenAI-compatible API
Updated -
-
🔧 🔗 https://github.com/vllm-project/vllm_allocator_adaptor An adaptor to allow Python allocator for PyTorch pluggable allocatorUpdated -
🔧 🔗 meta-llama/llama-recipes Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs.Updated