Projects with this topic
-
🔧 🔗 https://github.com/lucidrains/vector-quantize-pytorch Vector (and Scalar) Quantization, in PytorchUpdated -
Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6
Updated -
🔧 🔗 https://github.com/vllm-project/llm-compressor Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLMUpdated -
🔧 🔗 https://github.com/SYSTRAN/faster-whisperFaster Whisper transcription with CTranslate2
Updated -
🔧 🔗 https://github.com/microsoft/BitBLASBitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Updated -
🔧 🔗 https://github.com/IST-DASLab/marlin FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.Updated