Skip to content
release: 0.0.8

New features:

- weight-only quantization,
- integer matmul acceleration on CUDA.

Bug fixes:

- actually use float16 weights,
- avoid float16 overflows,
- correct device placement,
- robust serialization.