Fix qd8_f16_qb4w GEMM config getter always returning NULL on Arm by notaJiminLee · Pull Request #2 · nota-github/XNNPACK

notaJiminLee · 2026-06-19T03:50:09Z

Problem

On Arm64, every QD8/F16/QB4W fully-connected op fails at runtime creation with xnn_status_unsupported_hardware, even though the microkernels are present in the binary and the hardware supports FP16 (verified on Apple M4: FEAT_FP16=1, hardware config reports neon_fp16_arith / neon_dot / neon_i8mm).

Root cause

xnn_init_qd8_f16_qb4w_gemm_config() ends with:

return qd8_f16_qb4w_gemm_config.arch ? &qd8_f16_qb4w_gemm_config : NULL;

but init_qd8_f16_qb4w_gemm_config() never assigns .arch on any Arm branch (it sets only mr / nr / log2_kr / planes / dqgemm). So on Arm64 the getter always returns NULL.

This stays hidden until the convert -> fully_connected packed-LHS path sets XNN_FLAG_INLINE_LHS_PACKING; then setup_variant_and_gemm_config() receives a NULL gemm config and returns xnn_status_unsupported_hardware, aborting runtime creation.

The sibling xnn_init_qd8_f16_qc4w_gemm_config() returns its config unconditionally (its init likewise does not set .arch). qd8_f16_qb4w is the only getter carrying this stray .arch guard.

Fix

Return the config unconditionally, matching the qd8_f16_qc4w sibling.

Verification

On Apple M4, a QD8/F16/QB4W linear .pte that previously aborted with unsupported_hardware now runs to completion (Model executed successfully). The qd8_f32_qb4w and qd8_f16_qc4w paths are unaffected.

xnn_init_qd8_f16_qb4w_gemm_config() gates its return on qd8_f16_qb4w_gemm_config.arch, but init_qd8_f16_qb4w_gemm_config() never assigns .arch on any Arm branch (only mr/nr/log2_kr/planes/dqgemm). So on Arm64 the getter always returns NULL, and every QD8/F16/QB4W fully-connected op fails at runtime creation with xnn_status_unsupported_hardware. This surfaces once the convert->FC packed-LHS path sets INLINE_LHS_PACKING and setup_variant_and_gemm_config gets a NULL gemm config. Align with the qd8_f16_qc4w sibling getter, which returns the config unconditionally (its init likewise does not set .arch). Verified on Apple M4: a QD8/F16/QB4W linear .pte that previously aborted with unsupported_hardware now runs (Model executed successfully). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix qd8_f16_qb4w GEMM config getter always returning NULL on Arm#2

Fix qd8_f16_qb4w GEMM config getter always returning NULL on Arm#2
notaJiminLee wants to merge 1 commit into
masterfrom
fix/NPP02-6114-qd8-f16-qb4w-gemm-config

notaJiminLee commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

notaJiminLee commented Jun 19, 2026

Problem

Root cause

Fix

Verification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant