Q
quantization

Projects with this topic

View Llm Compressor project

mirrored_repos / MachineLearning / vllm-project / Llm Compressor

🔧🔗https://github.com/vllm-project/llm-compressor Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

sparsity compression quantization vllm

0

Updated Jan 10, 2026

0 0 0 0

Updated Jan 10, 2026
View Vector Quantize Pytorch project

mirrored_repos / MachineLearning / lucidrains / Vector Quantize Pytorch

🔧🔗https://github.com/lucidrains/vector-quantize-pytorch Vector (and Scalar) Quantization, in Pytorch

Deep Learning pytorch Python Synthetic In... scalar quantization vector

0

Updated Jan 01, 2026

0 0 0 0

Updated Jan 01, 2026
View xTuring project

mirrored_repos / MachineLearning / Stochastic / xTuring

Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

adapter Deep Learning Llama lora quantization language-model alpaca mistral fine-tuning peft mixed-precision gpt-2 gpt-j Large Langua... generative-ai gen-ai

0

Updated Dec 31, 2025

0 0 0 0

Updated Dec 31, 2025
View Faster Whisper project

mirrored_repos / MachineLearning / SYSTRAN / Faster Whisper

🔧🔗https://github.com/SYSTRAN/faster-whisper

Faster Whisper transcription with CTranslate2

Deep Learning inference transformer OpenAI quantization whisper speech-recog... speech-to-text

0

Updated Nov 19, 2025

0 0 0 0

Updated Nov 19, 2025
View BitBLAS project

mirrored_repos / MachineLearning / Microsoft / BitBLAS

🔧🔗https://github.com/microsoft/BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

matrix-multi... mixed-precision quantization Large Langua... Python

0

Updated Aug 06, 2025

0 0 0 0

Updated Aug 06, 2025
View Marlin project

mirrored_repos / MachineLearning / IST-DASLabs / Marlin

🔧🔗https://github.com/IST-DASLab/marlin FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

kernel quantization Large Langua... 4bit

0

Updated Oct 12, 2024

0 0 0 0

Updated Oct 12, 2024

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾