Projects with this topic
-
🔧 🔗 https://github.com/vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Updated -
🔧 🔗 https://github.com/sgl-project/sglangSGLang is a fast serving framework for large language models and vision language models.
Updated -
LLM.c
LLM training in simple, raw C/CUDA LLMs in simple, pure C/CUDA with no need for 245MB of PyTorch or 107MB of cPython. Current focus is on pretraining, in particular reproducing the GPT-2 and GPT-3 miniseries, along with a parallel PyTorch ref
Updated -
🔧 🔗 https://github.com/andrewkchan/yalmYet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
Updated -
https://github.com/janhq/nitro.git now: https://github.com/janhq/cortex.git Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM, ONNX). Powers
👋 JanUpdated