Skip to content
GitLab
Explore
Sign in
Overview
Active
Stale
All
This project is mirrored from
https://github.com/huggingface/optimum-quanto
. Pull mirroring updated
Sep 19, 2024
.
main
default
2075da7a
·
feat: e4m3fnuz added
·
Sep 17, 2024
add_e4m3fnuz
f37c58b6
·
feat: e4m3fnuz added
·
Sep 17, 2024
disable_int_mm_cpu
0a832208
·
fix(library): disable int_mm for CPU
·
Sep 16, 2024
add_marlin_int4_kernel
6e888fda
·
fix(marlin): avoid kernel crash on H100
·
Sep 13, 2024
marlin_neural_magic_step_by_step
06629972
·
fix(marlin_: avoid kernel crash on H100
·
Sep 05, 2024
marlin_neural_magic
6751ecad
·
style: reapply NeuralMagic formatting
·
Sep 03, 2024
fix_bugs
1fa6771b
·
fix(qmodule): register default scales with correct dtype
·
Sep 02, 2024
stop_enforcing_conventional_commits
960f7341
·
chore: one commit more than allowed
·
Sep 02, 2024
marlin-fp8-kernel-only
160f4c05
·
feat(library): add marlin float8/float8 mm kernel
·
Aug 27, 2024
feat-hub-support
b2d3a390
·
chore: apply make style.
·
Aug 15, 2024
load_strict_false
65ace79d
·
fix: support strict=False for quantized weights
·
Aug 14, 2024
marlin-fp8
07a56520
·
fix: remove extension load on unsupported system
·
Jul 29, 2024
refactor_qtensor_dispatch
858950e4
·
refactor(qtensor): split torch.nn.functional.linear dispatch
·
Jul 26, 2024
release-v0.2.4
832f7f5c
·
release: 0.2.4
·
Jul 26, 2024
fix_diffusers_model_import
05312bf8
·
chore: bump dev version
·
Jul 26, 2024
release-v0.2.3
4b71e426
·
release: 0.2.3
·
Jul 25, 2024
cleanup_readme
2dfef8d5
·
docs: cleanup README
·
Jul 25, 2024
fix_serialization_issues
b3de2a28
·
fix(qbits): correct stride when packing
·
Jul 25, 2024
feat-diffusers-models
7986c7cb
·
fix: force pos_embed initialization
·
Jul 24, 2024
reintroduce_qtensor_quantize
cea4ff6d
·
ci: temporarily use pt test channel
·
Jul 24, 2024
Prev
1
2
3
4
Next