Projects with this topic
Sort by:
-
🔧 🔗 https://github.com/microsoft/BitBLASBitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Updated -
🔧 🔗 https://github.com/IST-DASLab/marlin FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.Updated