Skip to content
  • Ruihang Lai's avatar
    e257fb8a
    [Runtime] CUDA IPC Memory support and custom allreduce kernels (#16750) · e257fb8a
    Ruihang Lai authored
    
    
    This PR introduces the CUDA IPC memory support in TVM runtime.
    IPC memory allows multiple distribtued workers accessing the GPU
    memory of each other directly. This functionality is helpful for
    implementing customzied communication primitives across distributed
    workers.
    
    In this PR, we bring the customized all-reduce implementation
    from TensorRT-LLM into 3rdparty. This all-reduce implementation
    makes use of the CUDA IPC memory. We expose the all-reduce function
    in global function under namespace `tvm::runtime::disco::cuda_ipc`.
    
    One unit test for the customized all-reduce kernel over two workers
    is added.
    
    ---
    
    Co-authored-by: default avatarHongyi Jin <hongyij@andrew.cmu.edu>
    e257fb8a
    [Runtime] CUDA IPC Memory support and custom allreduce kernels (#16750)
    Ruihang Lai authored
    
    
    This PR introduces the CUDA IPC memory support in TVM runtime.
    IPC memory allows multiple distribtued workers accessing the GPU
    memory of each other directly. This functionality is helpful for
    implementing customzied communication primitives across distributed
    workers.
    
    In this PR, we bring the customized all-reduce implementation
    from TensorRT-LLM into 3rdparty. This all-reduce implementation
    makes use of the CUDA IPC memory. We expose the all-reduce function
    in global function under namespace `tvm::runtime::disco::cuda_ipc`.
    
    One unit test for the customized all-reduce kernel over two workers
    is added.
    
    ---
    
    Co-authored-by: default avatarHongyi Jin <hongyij@andrew.cmu.edu>
This project is licensed under the Apache License 2.0. Learn more
Loading