J
jan
Projects with this topic
-
https://github.com/janhq/cortex.tensorrt-llm Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
Updated