Skip to content

Fix qd8_f16_qb4w GEMM config getter always returning NULL on Arm#2

Open
notaJiminLee wants to merge 1 commit into
masterfrom
fix/NPP02-6114-qd8-f16-qb4w-gemm-config
Open

Fix qd8_f16_qb4w GEMM config getter always returning NULL on Arm#2
notaJiminLee wants to merge 1 commit into
masterfrom
fix/NPP02-6114-qd8-f16-qb4w-gemm-config

Conversation

@notaJiminLee

Copy link
Copy Markdown
Collaborator

Problem

On Arm64, every QD8/F16/QB4W fully-connected op fails at runtime creation with xnn_status_unsupported_hardware, even though the microkernels are present in the binary and the hardware supports FP16 (verified on Apple M4: FEAT_FP16=1, hardware config reports neon_fp16_arith / neon_dot / neon_i8mm).

Root cause

xnn_init_qd8_f16_qb4w_gemm_config() ends with:

return qd8_f16_qb4w_gemm_config.arch ? &qd8_f16_qb4w_gemm_config : NULL;

but init_qd8_f16_qb4w_gemm_config() never assigns .arch on any Arm branch (it sets only mr / nr / log2_kr / planes / dqgemm). So on Arm64 the getter always returns NULL.

This stays hidden until the convert -> fully_connected packed-LHS path sets XNN_FLAG_INLINE_LHS_PACKING; then setup_variant_and_gemm_config() receives a NULL gemm config and returns xnn_status_unsupported_hardware, aborting runtime creation.

The sibling xnn_init_qd8_f16_qc4w_gemm_config() returns its config unconditionally (its init likewise does not set .arch). qd8_f16_qb4w is the only getter carrying this stray .arch guard.

Fix

Return the config unconditionally, matching the qd8_f16_qc4w sibling.

Verification

On Apple M4, a QD8/F16/QB4W linear .pte that previously aborted with unsupported_hardware now runs to completion (Model executed successfully). The qd8_f32_qb4w and qd8_f16_qc4w paths are unaffected.

xnn_init_qd8_f16_qb4w_gemm_config() gates its return on
qd8_f16_qb4w_gemm_config.arch, but init_qd8_f16_qb4w_gemm_config() never
assigns .arch on any Arm branch (only mr/nr/log2_kr/planes/dqgemm). So on
Arm64 the getter always returns NULL, and every QD8/F16/QB4W fully-connected
op fails at runtime creation with xnn_status_unsupported_hardware. This
surfaces once the convert->FC packed-LHS path sets INLINE_LHS_PACKING and
setup_variant_and_gemm_config gets a NULL gemm config.

Align with the qd8_f16_qc4w sibling getter, which returns the config
unconditionally (its init likewise does not set .arch).

Verified on Apple M4: a QD8/F16/QB4W linear .pte that previously aborted
with unsupported_hardware now runs (Model executed successfully).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant