Skip to content

GitLab

Explore

Sign in

4

4bit

Projects with this topic

Any
Batchfile
C
C#
C++
CMake
CSS
Cuda
Dockerfile
Go
HTML
Java
JavaScript
Jupyter Notebook
Makefile
PHP
Python
Ruby
Rust
SCSS
Shell
Svelte
Swift
TSX
TypeScript
Vue

Sort by
Updated date
Last created
Name
Name, descending
Most stars
Oldest updated
Oldest created
Hide archived projects
Show archived projects
Show archived projects only

mirrored_repos / MachineLearning / IST-DASLabs / Marlin

🔧🔗https://github.com/IST-DASLab/marlin FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

kernel quantization llm
+ 1 more

0

Updated Oct 12, 2024

0 0 0 0

Updated Oct 12, 2024

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾