Staging to Main by cryptopoly · Pull Request #76 · cryptopoly/ChaosEngineAI

cryptopoly · 2026-06-17T12:13:48Z

No description provided.

Bumps [tauri-plugin-opener](https://github.com/tauri-apps/plugins-workspace) from 2.5.3 to 2.5.4. - [Release notes](https://github.com/tauri-apps/plugins-workspace/releases) - [Commits](tauri-apps/plugins-workspace@http-v2.5.3...http-v2.5.4) --- updated-dependencies: - dependency-name: tauri-plugin-opener dependency-version: 2.5.4 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>

Bumps [tauri-plugin-dialog](https://github.com/tauri-apps/plugins-workspace) from 2.7.0 to 2.7.1. - [Release notes](https://github.com/tauri-apps/plugins-workspace/releases) - [Commits](tauri-apps/plugins-workspace@log-v2.7.0...log-v2.7.1) --- updated-dependencies: - dependency-name: tauri-plugin-dialog dependency-version: 2.7.1 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>

Updates the requirements on [pymdown-extensions](https://github.com/facelessuser/pymdown-extensions) to permit the latest version. - [Release notes](https://github.com/facelessuser/pymdown-extensions/releases) - [Commits](facelessuser/pymdown-extensions@10.7...10.21.3) --- updated-dependencies: - dependency-name: pymdown-extensions dependency-version: 10.21.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

Bumps [tauri](https://github.com/tauri-apps/tauri) from 2.11.0 to 2.11.2. - [Release notes](https://github.com/tauri-apps/tauri/releases) - [Commits](tauri-apps/tauri@tauri-v2.11.0...tauri-v2.11.2) --- updated-dependencies: - dependency-name: tauri dependency-version: 2.11.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>

Updates the requirements on [mkdocs](https://github.com/mkdocs/mkdocs) to permit the latest version. - [Release notes](https://github.com/mkdocs/mkdocs/releases) - [Commits](mkdocs/mkdocs@1.6.0...1.6.1) --- updated-dependencies: - dependency-name: mkdocs dependency-version: 1.6.1 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>

Bumps [tauri-build](https://github.com/tauri-apps/tauri) from 2.6.0 to 2.6.2. - [Release notes](https://github.com/tauri-apps/tauri/releases) - [Commits](tauri-apps/tauri@tauri-build-v2.6.0...tauri-build-v2.6.2) --- updated-dependencies: - dependency-name: tauri-build dependency-version: 2.6.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>

Bumps [tar](https://github.com/composefs/tar-rs) from 0.4.45 to 0.4.46. - [Release notes](https://github.com/composefs/tar-rs/releases) - [Commits](composefs/tar-rs@0.4.45...0.4.46) --- updated-dependencies: - dependency-name: tar dependency-version: 0.4.46 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>

Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.149 to 1.0.150. - [Release notes](https://github.com/serde-rs/json/releases) - [Commits](serde-rs/json@v1.0.149...v1.0.150) --- updated-dependencies: - dependency-name: serde_json dependency-version: 1.0.150 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com>

Bumps [rust-i18n](https://github.com/longbridge/rust-i18n) from 3.1.2 to 4.0.0. - [Release notes](https://github.com/longbridge/rust-i18n/releases) - [Commits](longbridge/rust-i18n@v3.1.2...v4.0.0) --- updated-dependencies: - dependency-name: rust-i18n dependency-version: 4.0.0 dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

….22.0 Loose-floor bumps (same pattern as FU-058/062/063/069); no code change. - turboquant-mlx-full 0.5.0 -> 0.6.2: expert-streaming reader tuning (read-coalescing, --prefetch-ahead, --pin-file). Apple Silicon. - mlx-vlm 0.5.0 -> 0.6.0. - vllm 0.21.0 -> 0.22.0 (both [vllm] and [triattention] extras): DeepSeek-V4 MTP spec-dec, Qwen3.5/3.6 GatedDeltaNet fixes, Gemma4 fixes, multi-tier KV offload. CUDA-only; not exercised locally. dflash-mlx v0.1.8 is available but deferred (FU-057 API-rewrite migration).

Cold mlx_lm+mlx+mlx_vlm import baseline crept to ~17.5s solo and ~31s under a sustained E2E run (concurrent model loads + thermal throttle), re-issued per MLX cell. 20s then 30s ceilings each blew a different Phase 1 cell with 'mlx_worker probe timed out'. 45s clears the ~31s loaded peak with headroom, still bounded enough to surface a wedged worker. Follow-up noted: cache the probe so it isn't re-run per load.

…rent-watch killpg Two fixes for the per-load MLX probe storm surfaced by the E2E sweep: 1. CAPABILITY_CACHE_TTL_SECONDS 10s -> 300s. load_model's refresh_capabilities() re-probed on every load because the 10s TTL was shorter than a single load+generate (40-70s), spawning a blocking 17-31s mlx_lm+mlx+mlx_vlm import each time (the creep behind FU-068's probe-timeout bumps). Native caps only change on install, and every install path force-refreshes, so a long TTL is safe. 2. _json_subprocess spawns the probe with start_new_session=True so it is no longer in the backend's process group. app._watch_parent_and_exit killpg(SIGTERM)s the group when the backend's parent dies; on a non-Tauri launch whose launch shell exits, that SIGTERM'd the probe mid-run ("probe exited with code -15"). The probe is a few-second transient, so escaping the parent-death cleanup leaks nothing. Tests: test_inference + test_setup_routes + test_diagnostics_routes + test_route_contracts green; standalone probe rc=0.

…plers Tier 1+2 of the chat-LLM perf/quality review. Performance (llama.cpp): - --cache-reuse 256 + cache_prompt:true on the chat payload so a growing conversation reuses the slot KV and re-prefills only the new suffix instead of the whole history (turn-2+ TTFT drops sharply; was O(n^2)). - Emit --flash-attn on when the user's fused_attention flag is set. It was plumbed into load_model + stored on LoadedModelInfo but never turned into a flag; threaded fused_attention into _build_command. Large Metal decode/KV-memory win; required for quantized KV cache types. Quality (samplers): - llama.cpp: add DRY (dry_multiplier/base/allowed_length), XTC (xtc_probability/threshold), top_n_sigma to _LLAMA_SAMPLER_KEYS (forward-only; old binaries ignore unknown fields). - MLX: wire XTC into the sampler + add repeat_penalty via a new logits_processors builder. repeat_penalty was shown in the UI but silently dropped because mlx-lm applies it through logits_processors, not make_sampler. - /v1 parity: forward min_p / repeat_penalty / mirostat(_tau/_eta) which the OpenAI-compat path dropped; added the request-model fields. Tests: new sampler/logits/parity cases; touched-area suites green. The one failing test (dflash runtime-bundle) is pre-existing/unrelated (orphaned dflash-mlx pin, FU-057).

…oning Tier 3 of the chat-LLM review. - Stop re-feeding prior <think> reasoning into history every turn. The live chat path now passes preserve_reasoning=False; upstream Qwen3 / DeepSeek-R1 templates strip prior reasoning, and replaying it inflated the prompt each turn. _build_history_with_reasoning keeps the capability for callers that still want it (sessions side already passed False). - Token-budgeted sliding window (optional token_budget arg). Keeps system messages + the newest turns that fit, drops the oldest — bounding prompt growth so a long chat can't silently truncate on llama.cpp or overflow context on MLX. Budget reserves room for system prompt + current prompt + max_tokens + template overhead, floors at 512 so the latest turn is always kept. Conservative ~3 chars/token estimate (no tokenizer at this layer) errs toward under-filling to avoid overflow. Tests: +7 windowing/budget cases; generate-path + services suites green.

The native MLX chat path rebuilt a fresh cache every turn and re-prefilled the whole conversation. This keeps one persistent mlx-lm prompt cache on the worker and reuses the longest matching token prefix across turns: trim the divergent tail, prefill only the new suffix, re-commit keyed by prompt+generated tokens. A single-slot port of mlx-lm server's LRUPromptCache.fetch_nearest_cache. - New backend_service/mlx_worker_prompt_cache.py: acquire / commit / invalidate. Gated to the native strategy (compression caches keep their path); guarded by can_trim_prompt_cache (SSM/Mamba/rotating-full reset, mlx-lm #980); resets on model change / no common prefix / partial trim / any exception -> fresh full prefill (identical output, no speedup). - WorkerState gains _persist_cache / _persist_tokens / _persist_cache_model_ref; invalidated on every load / unload / profile change. generate_standard + stream_generate collect generated token ids (GenerationResponse.token) so the persisted token list always equals the cache's positional contents (exact next-turn trim). Live-validated (mlx-community/Qwen2.5-0.5B-Instruct-4bit, same session): turn 1 promptTokens=34, turn 2 promptTokens=16 (vs ~90 without reuse) with a coherent, context-aware turn-2 answer -- ~5.6x less prompt processing, no corruption. Tests: +12 reuse-logic cases (fake cache, trim accounting, all reset paths); MLX worker suite green (the one fail, dflash runtime-bundle, is the pre-existing orphaned-pin issue).

Tier 2 added DRY/XTC/top-n-sigma at the engine layer but nothing populated them from a request, so they were unreachable. Complete the chain: - GenerateRequest gains xtcProbability / xtcThreshold / dryMultiplier / dryBase / dryAllowedLength; _build_sampler_overrides maps them to the engine-side snake_case keys (llama-server forwards all via _LLAMA_SAMPLER_KEYS; mlx-lm applies XTC via make_sampler, ignores DRY). - SamplerPanel gains xtc_probability / xtc_threshold / dry_multiplier rows; SamplerOverrides type + samplerOverrides storage/serialize carry the new fields. XTC adds creative variety (both engines); DRY kills verbatim repetition loops better than repeat_penalty (llama.cpp). Both default off. Tests: backend mapping + frontend projection/round-trip; tsc + vitest green.

Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 5 to 7. - [Release notes](https://github.com/actions/upload-artifact/releases) - [Commits](actions/upload-artifact@v5...v7) --- updated-dependencies: - dependency-name: actions/upload-artifact dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](actions/checkout@v4...v6) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

Bumps [actions/setup-python](https://github.com/actions/setup-python) from 5 to 6. - [Release notes](https://github.com/actions/setup-python/releases) - [Commits](actions/setup-python@v5...v6) --- updated-dependencies: - dependency-name: actions/setup-python dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>

…ites) The chat SSE stream emitted one json.dumps + frame per token. Batch visible token text in the standard (non-tool) stream path and flush on a 24-char / 50ms window, before any non-token event (reasoning / reasoningDone / panic / thermal / error), and at stream end. Disabled when per-token logprobs are requested (they must stay 1:1 aligned); the agent/tool path is unchanged. full_text accumulation + the per-token runaway / stall / loop guards are untouched, so persisted output + abort behaviour are identical; the frontend reassembles the same text from larger frames. Live-validated (mlx-community/Qwen2.5-0.5B-Instruct-4bit): a ~40-token reply streamed as 9 token frames (avg ~20 chars) instead of ~40, text fully coherent + correctly ordered (phase -> tokens -> done). ~4-5x fewer frames.

…dribble The token coalescer landed on the standard stream path but the agent/tool path still emitted one frame per token, and the agent loop fake-streamed the already-computed final answer 4 chars at a time. Now the agent token forwarding uses the same coalescer (flush before toolCallStart / toolCallResult), and the agent loop emits the final answer in 48-char chunks instead of 4 -- less fake latency + fewer yields. Tool-call detection + execution flow is unchanged. Tests: test_agent + test_backend_service green; identical coalescer mechanics to the live-validated standard path.

perf(chat): KV reuse, flash-attn, modern samplers, history window, token coalescing

New-feature gate (CLAUDE.md) for the chat-LLM work: - "modern samplers (DRY+XTC)": a chat generate carrying xtcProbability + dryMultiplier must be accepted and produce tokens (proves the request field -> _build_sampler_overrides -> engine plumbing). Hard gate. - "MLX prompt-cache reuse": two same-session turns; passes when turn-2 reprocesses fewer prompt tokens (cache reused), skips when it doesn't engage (a model whose generated tokens don't round-trip at the answer boundary, or a reasoning model -> correct graceful full-reprocess). Soft + non-flaky; the reuse/trim logic is hard-tested in tests/test_mlx_prompt_cache.py. pre-build-check.sh needs no change: it already runs the pytest/tsc/vitest that cover the new unit tests, and no deps / pins / cache-types changed.

…m + vllm Release upstream polish. Deps (loose floor bumps, no code change): - mlx-vlm 0.6.0 -> 0.6.3 - vllm 0.22.0 -> 0.22.1 ([vllm] + [triattention] extras) Discover catalog -- two frontier sparse-MoE families (text-only, verified HF repos + real on-disk sizes): - DeepSeek V4: Flash (284B / ~13B active, 1M ctx, baked-in MTP head) + Pro (1.6T). mlx-community 4-bit Flash (154 GB) is the local-viable entry; official BF16 + 8-bit + Pro listed for awareness. - GLM-5 / GLM-5.1: GlmMoeDsa MoE (256 experts / 8 active, ~200K ctx). unsloth GGUF (Q4_K_M ~515 GB) + mlx-community MXFP4 + zai-org BF16. Both text-only (configs carry no vision_config) so capabilities omit vision -- no broken composer affordance. Tests + gate: - tests/test_catalog_text_families.py: parse + required-field + text-only + discover-payload checks. - E2E phase 0 "new model families" check asserts both surface in the live /api/workspace catalog with their full variant set. Validated: phase 0 PASS, 11 checks. Tracked follow-ups (not in this change): MTPLX installer already auto-updates to v1.0.1 (re-test FU-079 empty-output vs its new /v1 streaming); dflash-mlx v0.1.9 migration stays deferred (FU-057); llama-cpp-turboquant branch drifted (FU-065 commit-pin needs a verified test-build).

- FU-065: turbo branch drifted 2cbfdc62 -> 73eb521d (reproducibility risk confirmed; pin still deferred pending a verified test-compile). - FU-079: MTPLX hit v1.0.0/v1.0.1 (installer auto-updates from 0.3.5); v1.0.0 added real /v1 token streaming -> re-test the empty-output against v1.0.1. - FU-067: dflash-mlx v0.1.9 now tagged; FU-057 migration stays deferred.

Gemma 4 (gemma-4 family): - E2B: 2B multimodal, 128K ctx — official QAT Q4_0 GGUF (~1.5 GB) + BF16 - 31B: 31B multimodal, 256K ctx — MLX 8-bit, unsloth Q4_K_M GGUF, official QAT GGUF, BF16 - Both carry vision capability (Gemma4ForConditionalGeneration + vision_config confirmed) MiniMax M2.7 (minimax-m2 family): - 256 routed experts / 8 active, 200K ctx, ~240B total params / ~480 GB BF16 - mlx-community MXFP4 (~120 GB), unsloth GGUF Q4_K_M (~130 GB), official BF16 Qwen3.7 skipped — no official Qwen/Qwen3.7-* repo exists on HF as of 2026-06-12. Tests: 7 catalog gate checks updated to cover all 4 frontier families (shape, vision vs text-only, context windows, discover payload presence).

…0.23.0 2026-06-15 upstream scan: - turboquant-mlx-full 0.8.0: adds Mamba/hybrid arch support + GPT-OSS-120B optimizations. Same TurboQuantKVCache call surface, backward compatible. Floor: 0.6.2 → 0.8.0. - vllm 0.23.0 released. Floor: 0.22.1 → 0.23.0 (both [vllm] and [triattention] extras). No action needed: - mlx-vlm 0.6.3: already at floor, unchanged. - mlx-lm 0.31.3: installed version, loose >=0.22.0 floor sufficient. - mlx 0.31.2: installed version, loose >=0.22.0 floor sufficient. - diffusers 0.38.0: at floor, no new release. - TriAttention: still at pinned c3744ee6 (v0.2.0), no upstream change. Deferred (tracker notes updated): - dflash-mlx: v0.1.10 now tagged; FU-057/067 migration still deferred. - llama-server-turbo branch: HEAD drifted to 7985f6b9; FU-065 deferred. - TurboQuant+: v0.3.2.3 latest, no PyPI wheel, FU-032 trigger not met. - MTPLX: now v1.0.4; FU-079 re-test still pending.

feat(catalog): frontier model families + dep bumps (2026-06-15)

…ons/upload-artifact-7 chore(deps): bump actions/upload-artifact from 5 to 7

…ons/checkout-6 chore(deps): bump actions/checkout from 4 to 6

…ons/setup-python-6 chore(deps): bump actions/setup-python from 5 to 6

…ons-gte-10.21.3 chore(deps): update pymdown-extensions requirement from >=10.7 to >=10.21.3

chore(deps): update mkdocs requirement from >=1.6 to >=1.6.1

chore(deps): bump mkdocs-material from >=9.5 to >=9.7.6

…ri-plugin-opener-2.5.4 Bump tauri-plugin-opener from 2.5.3 to 2.5.4 in /src-tauri

…ri-plugin-dialog-2.7.1 Bump tauri-plugin-dialog from 2.7.0 to 2.7.1 in /src-tauri

…-0.4.46 chore(deps): bump tar from 0.4.45 to 0.4.46 in /src-tauri

…de_json-1.0.150 chore(deps): bump serde_json from 1.0.149 to 1.0.150 in /src-tauri

…ri-build-2.6.2 chore(deps): bump tauri-build from 2.6.0 to 2.6.2 in /src-tauri

…ri-2.11.2 chore(deps): bump tauri from 2.11.0 to 2.11.2 in /src-tauri

…t-i18n-4.0.0 chore(deps): bump rust-i18n from 3.1.2 to 4.0.0 in /src-tauri

…ort 4.1.0 API change)

…ED_LIMIT_INFORMATION (windows-sys 0.61.2)

dependabot Bot and others added 30 commits May 8, 2026 20:19

Merge pull request #73 from cryptopoly/feature/chatt-llm-improvements

766a610

perf(chat): KV reuse, flash-attn, modern samplers, history window, token coalescing

feat(catalog): frontier model families + dep bumps (2026-06-15)

851a6ad

feat(catalog): frontier model families + dep bumps (2026-06-15)

Merge pull request #70 from cryptopoly/dependabot/github_actions/acti…

9ced3bf

…ons/upload-artifact-7 chore(deps): bump actions/upload-artifact from 5 to 7

Merge pull request #71 from cryptopoly/dependabot/github_actions/acti…

5be6683

…ons/checkout-6 chore(deps): bump actions/checkout from 4 to 6

cryptopoly added 16 commits June 15, 2026 10:45

Merge pull request #72 from cryptopoly/dependabot/github_actions/acti…

2187911

…ons/setup-python-6 chore(deps): bump actions/setup-python from 5 to 6

Merge pull request #60 from cryptopoly/dependabot/pip/pymdown-extensi…

25ef79e

…ons-gte-10.21.3 chore(deps): update pymdown-extensions requirement from >=10.7 to >=10.21.3

Merge pull request #62 from cryptopoly/dependabot/pip/mkdocs-gte-1.6.1

3f2e0a6

chore(deps): update mkdocs requirement from >=1.6 to >=1.6.1

chore(deps): bump mkdocs-material from >=9.5 to >=9.7.6

91fff54

Merge pull request #75 from cryptopoly/fix/merge-dependabot-docs

a66218d

chore(deps): bump mkdocs-material from >=9.5 to >=9.7.6

Merge pull request #41 from cryptopoly/dependabot/cargo/src-tauri/tau…

dd6ccc8

…ri-plugin-opener-2.5.4 Bump tauri-plugin-opener from 2.5.3 to 2.5.4 in /src-tauri

Merge pull request #42 from cryptopoly/dependabot/cargo/src-tauri/tau…

41d1fca

…ri-plugin-dialog-2.7.1 Bump tauri-plugin-dialog from 2.7.0 to 2.7.1 in /src-tauri

Merge pull request #65 from cryptopoly/dependabot/cargo/src-tauri/tar…

2b1198a

…-0.4.46 chore(deps): bump tar from 0.4.45 to 0.4.46 in /src-tauri

Merge pull request #66 from cryptopoly/dependabot/cargo/src-tauri/ser…

7ade3e6

…de_json-1.0.150 chore(deps): bump serde_json from 1.0.149 to 1.0.150 in /src-tauri

Merge pull request #63 from cryptopoly/dependabot/cargo/src-tauri/tau…

f76349a

…ri-build-2.6.2 chore(deps): bump tauri-build from 2.6.0 to 2.6.2 in /src-tauri

Merge pull request #61 from cryptopoly/dependabot/cargo/src-tauri/tau…

855e210

…ri-2.11.2 chore(deps): bump tauri from 2.11.0 to 2.11.2 in /src-tauri

Merge pull request #67 from cryptopoly/dependabot/cargo/src-tauri/rus…

f20e22a

…t-i18n-4.0.0 chore(deps): bump rust-i18n from 3.1.2 to 4.0.0 in /src-tauri

chore: bump version to 0.9.4

d46e7d4

fix(deps): bump rust-i18n 4.0.0 -> 4.1.0 (fixes build break from supp…

34db2e6

…ort 4.1.0 API change)

fix(windows): add Win32_System_Threading feature for JOBOBJECT_EXTEND…

0b00531

…ED_LIMIT_INFORMATION (windows-sys 0.61.2)

Update Cargo.lock

5a57010

cryptopoly merged commit 2ce908a into main Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Staging to Main#76

Staging to Main#76
cryptopoly merged 46 commits into
mainfrom
staging

cryptopoly commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cryptopoly commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant