Skip to content

fix rms_norm op to adapt with ascend#1247

Open
ShaneWoof wants to merge 1 commit into
InfiniTensor:mainfrom
ShaneWoof:fix/rms_norm_ascend
Open

fix rms_norm op to adapt with ascend#1247
ShaneWoof wants to merge 1 commit into
InfiniTensor:mainfrom
ShaneWoof:fix/rms_norm_ascend

Conversation

@ShaneWoof

Copy link
Copy Markdown

修改rms_norm算子在昇腾ascend上对跨半精度数据格式和3D张量进行适配。

修改内容:
1、修改Infinicore/src/infiniop/ops/rms_norm/ascend/rms_norm_aclnn.cc,当 w dtype 与 x dtype 不同且非 F32 时将 w cast 到 F32 再调用aclnnRmsNorm,并修复 3D 张量的 slice_shape 取值和循环遍历问题;
2、新增Infinicore/src/infiniop/ops/rms_norm/ascend/cast_kernel.cpp,实现 AscendC Cast 核函数(F16/BF16 → F32),用于跨半精度时 w dtype 的转换;
3、修改Infinicore/src/infiniop/devices/ascend/CMakeLists.txt,添加 cast_kernel.cpp 编译项。

现状:infiniop算子测试全部通过,infinicore算子接口测试跑通,103/108 Passed
(1)infiniop
测试样例:
image
测试结果:
image
(2)infinicore
测试样例:
image
测试结果:
image

修改rms_norm算子在昇腾ascend上对跨半精度数据格式和3D张量进行适配。

修改内容:
1、修改Infinicore/src/infiniop/ops/rms_norm/ascend/rms_norm_aclnn.cc,当 w dtype 与 x dtype 不同且非 F32 时将 w cast 到 F32 再调用aclnnRmsNorm,并修复 3D 张量的 slice_shape 取值和循环遍历问题;
2、新增Infinicore/src/infiniop/ops/rms_norm/ascend/cast_kernel.cpp,实现 AscendC Cast 核函数(F16/BF16 → F32),用于跨半精度时 w dtype 的转换;
3、修改Infinicore/src/infiniop/devices/ascend/CMakeLists.txt,添加 cast_kernel.cpp 编译项。

现状:infiniop算子测试全部通过,infinicore算子接口测试跑通,103/108 Passed。
@ShaneWoof ShaneWoof requested a review from a team June 11, 2026 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant