fix: fix CUDA elementwise dynamic block shared memory by chen2021673 · Pull Request #172 · InfiniTensor/InfiniTrain

chen2021673 · 2026-06-09T08:16:48Z

概要

修复 CUDA elementwise backward 在动态选择 block size 后，共享内存大小仍使用旧的模板参数 BLOCK_SIZE 计算的问题。

问题背景

LaunchKernel 已经改为根据 tensor size 通过 ChooseBlockSize() 动态选择 block size。但部分 elementwise backward launch 逻辑仍然使用模板参数 BLOCK_SIZE 来计算 dynamic shared memory。

这会导致实际 launch 配置和共享内存分配发生分歧：大 tensor 下运行时 block size 可能选择 512,但 shared memory 仍按 BLOCK_SIZE == 256 计算。bf16/half broadcast backward 的 block-reduction kernel 内部按 blockDim.x 访问 shared memory，最终 shared memory 分配不足，出现越界写。

修复方案

移除 elementwise launch helper 中不再可靠的 BLOCK_SIZE 模板参数，新增 ChooseBlockDims()，统一根据 ChooseBlockSize() 生成实际 launch block。
forward/backward launch path 统一使用运行时选择的 block dims。
dynamic shared memory 大小改为使用实际传入 launch lambda 的 block.x 计算。
新增 bf16 broadcast backward 大 block 回归测试。

回归测试

新增测试：

AutogradElementwiseBackwardTest.BFloat16MulBroadcastBackwardLargeBlock

覆盖场景：

a: [512, 8192], bf16
b: [8192], bf16 broadcast
grad: [512, 8192], bf16

.62机器测试结果：

kilinchange · 2026-06-10T09:37:12Z

另外 commit message 麻烦修正下：
fix: fix CUDA elementwise dynamic block shared memory

kilinchange · 2026-06-11T03:18:49Z

补充 node24 测试截图；
format 修复。

chen2021673 · 2026-06-11T06:47:55Z

补充 node24 测试截图；

format 修复。

node24 测试截图：

已修复。

kilinchange reviewed Jun 10, 2026

View reviewed changes

Comment thread infini_train/src/kernels/cuda/elementwise.cu Outdated

chen2021673 force-pushed the fix_batch_size branch from 7a8cae6 to e5dd6de Compare June 11, 2026 02:32

kilinchange approved these changes Jun 11, 2026

View reviewed changes

fix: fix CUDA elementwise dynamic block shared memory

41dd2a8

chen2021673 force-pushed the fix_batch_size branch from e5dd6de to 41dd2a8 Compare June 11, 2026 06:26

kilinchange merged commit 456e2fd into master Jun 12, 2026
2 checks passed

kilinchange deleted the fix_batch_size branch June 12, 2026 01:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix CUDA elementwise dynamic block shared memory#172

fix: fix CUDA elementwise dynamic block shared memory#172
kilinchange merged 1 commit into
masterfrom
fix_batch_size

chen2021673 commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

kilinchange commented Jun 10, 2026

Uh oh!

kilinchange commented Jun 11, 2026

Uh oh!

chen2021673 commented Jun 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chen2021673 commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

概要

问题背景

修复方案

回归测试

Uh oh!

Uh oh!

kilinchange commented Jun 10, 2026

Uh oh!

kilinchange commented Jun 11, 2026

Uh oh!

chen2021673 commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chen2021673 commented Jun 9, 2026 •

edited

Loading

chen2021673 commented Jun 11, 2026 •

edited

Loading