LICENSE · mlc · mirrored_repos / MachineLearning / MLC-AI / Relax · GitLab

Mar 20, 2024

[Runtime] CUDA IPC Memory support and custom allreduce kernels (#16750) · e257fb8a

Ruihang Lai authored Mar 20, 2024



This PR introduces the CUDA IPC memory support in TVM runtime.
IPC memory allows multiple distribtued workers accessing the GPU
memory of each other directly. This functionality is helpful for
implementing customzied communication primitives across distributed
workers.

In this PR, we bring the customized all-reduce implementation
from TensorRT-LLM into 3rdparty. This all-reduce implementation
makes use of the CUDA IPC memory. We expose the all-reduce function
in global function under namespace `tvm::runtime::disco::cuda_ipc`.

One unit test for the customized all-reduce kernel over two workers
is added.

---

Co-authored-by: Hongyi Jin <hongyij@andrew.cmu.edu>

e257fb8a

[Runtime] CUDA IPC Memory support and custom allreduce kernels (#16750)

Ruihang Lai authored Mar 20, 2024



This PR introduces the CUDA IPC memory support in TVM runtime.
IPC memory allows multiple distribtued workers accessing the GPU
memory of each other directly. This functionality is helpful for
implementing customzied communication primitives across distributed
workers.

In this PR, we bring the customized all-reduce implementation
from TensorRT-LLM into 3rdparty. This all-reduce implementation
makes use of the CUDA IPC memory. We expose the all-reduce function
in global function under namespace `tvm::runtime::disco::cuda_ipc`.

One unit test for the customized all-reduce kernel over two workers
is added.

---

Co-authored-by: Hongyi Jin <hongyij@andrew.cmu.edu>

This project is licensed under the Apache License 2.0. Learn more

🐾❤️ Strive to be the person your dogs believe you are ❤️🐾