Fix qd8_f16_qb4w GEMM config getter always returning NULL on Arm#2
Open
notaJiminLee wants to merge 1 commit into
Open
Fix qd8_f16_qb4w GEMM config getter always returning NULL on Arm#2notaJiminLee wants to merge 1 commit into
notaJiminLee wants to merge 1 commit into
Conversation
xnn_init_qd8_f16_qb4w_gemm_config() gates its return on qd8_f16_qb4w_gemm_config.arch, but init_qd8_f16_qb4w_gemm_config() never assigns .arch on any Arm branch (only mr/nr/log2_kr/planes/dqgemm). So on Arm64 the getter always returns NULL, and every QD8/F16/QB4W fully-connected op fails at runtime creation with xnn_status_unsupported_hardware. This surfaces once the convert->FC packed-LHS path sets INLINE_LHS_PACKING and setup_variant_and_gemm_config gets a NULL gemm config. Align with the qd8_f16_qc4w sibling getter, which returns the config unconditionally (its init likewise does not set .arch). Verified on Apple M4: a QD8/F16/QB4W linear .pte that previously aborted with unsupported_hardware now runs (Model executed successfully). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On Arm64, every
QD8/F16/QB4Wfully-connected op fails at runtime creation withxnn_status_unsupported_hardware, even though the microkernels are present in the binary and the hardware supports FP16 (verified on Apple M4:FEAT_FP16=1, hardware config reportsneon_fp16_arith/neon_dot/neon_i8mm).Root cause
xnn_init_qd8_f16_qb4w_gemm_config()ends with:but
init_qd8_f16_qb4w_gemm_config()never assigns.archon any Arm branch (it sets onlymr/nr/log2_kr/planes/dqgemm). So on Arm64 the getter always returnsNULL.This stays hidden until the
convert -> fully_connectedpacked-LHS path setsXNN_FLAG_INLINE_LHS_PACKING; thensetup_variant_and_gemm_config()receives a NULL gemm config and returnsxnn_status_unsupported_hardware, aborting runtime creation.The sibling
xnn_init_qd8_f16_qc4w_gemm_config()returns its config unconditionally (its init likewise does not set.arch).qd8_f16_qb4wis the only getter carrying this stray.archguard.Fix
Return the config unconditionally, matching the
qd8_f16_qc4wsibling.Verification
On Apple M4, a
QD8/F16/QB4Wlinear.ptethat previously aborted withunsupported_hardwarenow runs to completion (Model executed successfully). Theqd8_f32_qb4wandqd8_f16_qc4wpaths are unaffected.