Projects with this topic
Sort by:
-
🔧 🔗 https://github.com/andrewkchan/yalmYet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
Updated -
Real-time inference for Stable Diffusion - 0.88s latency. Covers AITemplate, nvFuser, TensorRT, FlashAttention. Join our Discord communty: https://discord.com/invite/TgHXuSJEk6
Updated