-
Tworan authored
Fix segmentation fault for models exceeding 40B on AMD GPUs & optimize mul_mat_axpy operation (#217) * fix segementation fault when model exceeds 40B on ROCm platform * optimize axpy kernel * optimize op: mulmat_axpy_sparse * fix bug when model exceeds 40B on AMD GPU * optimize op: mulmat_axpy_sparse --------- Co-authored-by: tworan <tworan.csu@gmail.com>
Loading