Projects with this topic
-
https://github.com/ggml-org/llama.cpp LLM inference in C/C++
Updated -
🔧 🔗 https://github.com/vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Updated -
https://github.com/Mozilla-Ocho/llamafile Distribute and run LLMs with a single file. llamafile.ai
Updated -
https://github.com/InternLM/lmdeploy LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
🔗 lmdeploy.readthedocs.io/en/latest/Updated -
🔧 🔗 https://github.com/meta-llama/PurpleLlamaSet of tools to assess and improve LLM security.
Updated -
🔧 🔗 https://github.com/sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Updated -
🔧 🔗 https://github.com/andrewkchan/yalmYet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
Updated -
https://github.com/janhq/nitro.git now: https://github.com/janhq/cortex.git Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM, ONNX). Powers
👋 JanUpdated -
🔧 🔗 https://github.com/FoundationVision/Groma[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
Updated