[AMD] refactor: engine-neutral aiperf plotter + fill sglang panels#1774
Open
AMD-yanfeiwang wants to merge 1 commit into
Open
[AMD] refactor: engine-neutral aiperf plotter + fill sglang panels#1774AMD-yanfeiwang wants to merge 1 commit into
AMD-yanfeiwang wants to merge 1 commit into
Conversation
Replace the vLLM-anchored flat alias table in generate_aiperf_plots.py with a semantic Metric enum plus a per-engine ENGINE_METRICS registry. Panels now reference engine-neutral metric keys and the registry resolves them to the concrete series each engine exports (detected from the metric namespace). Adding a new backend becomes a single new table with no panel changes. Also fill previously-blank panels for sglang using upstream (main-branch) metrics only: - External/host KV usage from hicache_host_used/total_tokens (ratio adapter) - KV offload transfer + cumulative from backuped_tokens / load_back_tokens, rendered in tokens (sglang exposes no per-token KV byte size upstream; offload panels are now unit-aware: bytes for vLLM, tokens for sglang) The per-tier prefix-cache hit-rate split and prefill-source breakdown stay blank for sglang since they depend on fork-only counters. vLLM behavior is unchanged (string mappings + identical byte units).
Collaborator
|
@AMD-yanfeiwang can you rebase this onto agentx-v0.4 branch please? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
utils/generate_aiperf_plots.pywith a semanticMetricenum + per-engineENGINE_METRICSregistry. Panels reference engine-neutral keys; the engine is auto-detected from the metric namespace and the registry resolves each key to the concrete series that engine exports. Adding a backend (e.g. TensorRT-LLM) is now a single new table with no panel changes.aggregate_timeseries.hicache_host_used_tokens / hicache_host_total_tokens(ratio adapter) → drives the External line in the KV Cache Utilization panel.backuped_tokens(GPU→CPU) andload_back_tokens(CPU→GPU). sglang exposes no per-token KV byte size upstream, so these render in tokens; offload panels are now unit-aware (bytes/MB/GB for vLLM, tokens for sglang).cache_hit_tokens_l1/l2/l3,cache_miss_tokens) that are not yet merged upstream.Notes
Test plan
generate_aiperf_plots.pyon an sglang result dir; engine auto-detected as sglang, new panels (External KV usage + 3 KV offload panels) populate with real data, fork-dependent panels stay blank.Metricresolution:CPU_KV_CACHE_USAGE→ ratio adapter,KV_OFFLOAD_G2C→sglang:backuped_tokens,KV_OFFLOAD_C2G→sglang:load_back_tokens.Note
Low Risk
Offline plotting utility only; changes affect visualization labels/units for sglang and refactor metric resolution without touching runtime inference or auth.
Overview
Refactors
generate_aiperf_plots.pyso panels use a semanticMetricenum and per-engineENGINE_METRICSregistry instead of hard-codedvllm:*names. Engine detection infers vLLM vs sglang from metric namespace prefixes (cached on the export);aggregate_timeseriesresolves metrics and can dispatch ratio adapters for composite series (e.g. HiCache host used/total → external KV usage %).sglang gains upstream-only mappings: external KV line, queue/throughput/preemptions, aggregate prefix hit rate via
PREFIX_CACHE_HIT_RATEwhen hit/query counters are missing, and HiCache offload panels using token units. KV offload panels are unit-aware (MB/s & GB for vLLM, tokens/s & M tokens for sglang). Panels that need fork-only sglang counters (GPU/External prefix split, prefill source stackplot) still render empty. Figure suptitle is generalized to "LLM Server Metrics During Benchmark"; vLLM string mappings are intended to preserve prior vLLM plot behavior.Reviewed by Cursor Bugbot for commit 4f33930. Bugbot is set up for automated code reviews on this repo. Configure here.