Projects with this topic
Sort by:
-
🔧 🔗 https://github.com/vllm-project/vllmA high-throughput and memory-efficient inference and serving engine for LLMs
Updated -
LLM.c
LLM training in simple, raw C/CUDA LLMs in simple, pure C/CUDA with no need for 245MB of PyTorch or 107MB of cPython. Current focus is on pretraining, in particular reproducing the GPT-2 and GPT-3 miniseries, along with a parallel PyTorch ref
Updated -
https://github.com/janhq/nitro.git now: https://github.com/janhq/cortex.git Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM, ONNX). Powers
👋 JanUpdated