This project is mirrored from https://github.com/huggingface/optimum-quanto.
Pull mirroring updated .
-
-
-
-
-
v0.2.096ab5d3e · ·
release: 0.2.0 New: - requantize helper by @calmitchell617, - StableDiffusion example by @thliang01, - improved linear backward path, - AWQ int4 kernels.
-
-
v0.0.13addd7122 · ·
release: 0.0.13 - new `QConv2d` quantized module, - official support for `float8` weights. - fix `QbitsTensor.to()` that was not moving the inner tensors, - prevent shallow `QTensor` copies when loading weights that do not move inner tensors.
-
-
-
0.0.105ab7e6a9 · ·
release: 0.0.10 New features: - calibration streamline option to remove spurious quantize/dequantize, - calibration debug mode.
-
0.0.98acbefc1 · ·
release: 0.0.9 New features: - quantize weights and activations parameters - float8 activations
-
0.0.863041a49 · ·
release: 0.0.8 New features: - weight-only quantization, - integer matmul acceleration on CUDA. Bug fixes: - actually use float16 weights, - avoid float16 overflows, - correct device placement, - robust serialization.
-
-
0.0.6fe330f0d · ·
release: 0.0.6 New features: - support `opt` models, - support `gpt-neox` models, - support `codegen` models.
-
-
-
-