Skip to content

Resolve AppKit and Agent Skills versions from compatibility manifest#5139

Open
pkosiec wants to merge 7 commits intomainfrom
pkosiec/appkit-version-pinning
Open

Resolve AppKit and Agent Skills versions from compatibility manifest#5139
pkosiec wants to merge 7 commits intomainfrom
pkosiec/appkit-version-pinning

Conversation

@pkosiec
Copy link
Copy Markdown
Member

@pkosiec pkosiec commented Apr 30, 2026

Summary

Introduces a CLI compatibility manifest (internal/build/cli-compat.json) that maps CLI versions to compatible AppKit template and Agent Skills versions. This enables template updates to reach users without CLI releases.

The manifest is resolved with a 3-tier fallback:

  1. Fresh local cache (< 1h)
  2. Remote fetch from GitHub (with retry)
  3. Stale local cache or embedded manifest fallback

Companion PRs:

Screenshot

image

@pkosiec pkosiec temporarily deployed to test-trigger-is April 30, 2026 15:10 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is April 30, 2026 15:10 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is April 30, 2026 15:20 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is April 30, 2026 15:20 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is April 30, 2026 16:12 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is April 30, 2026 16:12 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is April 30, 2026 16:16 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is April 30, 2026 16:16 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is April 30, 2026 16:25 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is April 30, 2026 16:25 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is April 30, 2026 16:35 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is April 30, 2026 16:35 — with GitHub Actions Inactive
@pkosiec pkosiec marked this pull request as ready for review April 30, 2026 16:39
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

Approval status: pending

/cmd/apps/ - needs approval

Files: cmd/apps/init.go, cmd/apps/init_test.go, cmd/apps/manifest.go
Suggested: @MarioCadenas
Also eligible: @arsenyinfo, @keugenek, @calvarjorge, @fjakobs, @jamesbroadhead, @Shridhad, @atilafassina, @igrekun, @pffigueiredo, @ditadi

/experimental/aitools/ - needs approval

10 files changed
Suggested: @MarioCadenas
Also eligible: @arsenyinfo, @keugenek, @calvarjorge, @fjakobs, @jamesbroadhead, @Shridhad, @atilafassina, @igrekun, @pffigueiredo, @ditadi, @lennartkats-db

/internal/ - needs approval

Files: internal/build/README.md, internal/build/cli-compat.json, internal/build/clicompat.go
Suggested: @simonfaltum
Also eligible: @renaudhartert-db, @hectorcast-db, @parthban-db, @tanmay-db, @Divyansh-db, @tejaskochar-db, @mihaimitrea-db, @chrisst, @rauchy

General files (require maintainer)

4 files changed
Based on git history:

  • @pietern -- recent work in internal/build/, experimental/aitools/lib/installer/, cmd/apps/

Any maintainer (@andrewnester, @anton-107, @denik, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

Copy link
Copy Markdown
Contributor

@renaudhartert-db renaudhartert-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @pkosiec. Could we have a chat internally about what you're trying to achieve? I'd like to make sure that this is aligned with the overall direction we're planning to evolve that command toward.

@pkosiec pkosiec temporarily deployed to test-trigger-is May 5, 2026 12:35 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is May 5, 2026 12:35 — with GitHub Actions Inactive
@pkosiec pkosiec changed the title feat: resolve AppKit template version from compatibility manifest feat: CLI compatibility manifest with 3-tier fallback May 5, 2026
@pkosiec pkosiec marked this pull request as draft May 5, 2026 12:43
@pkosiec pkosiec force-pushed the pkosiec/appkit-version-pinning branch from d7d005f to 3241ae2 Compare May 5, 2026 13:00
@pkosiec pkosiec temporarily deployed to test-trigger-is May 5, 2026 13:00 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is May 5, 2026 13:00 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is May 5, 2026 13:17 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is May 5, 2026 13:17 — with GitHub Actions Inactive
@pkosiec pkosiec changed the title feat: CLI compatibility manifest with 3-tier fallback Resolve AppKit and Agent Skills versions from compatibility manifest May 5, 2026
@pkosiec pkosiec temporarily deployed to test-trigger-is May 5, 2026 13:24 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is May 5, 2026 13:24 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is May 5, 2026 13:30 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is May 5, 2026 13:30 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is May 5, 2026 13:59 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is May 5, 2026 13:59 — with GitHub Actions Inactive
@pkosiec pkosiec marked this pull request as ready for review May 5, 2026 13:59
@pkosiec pkosiec temporarily deployed to test-trigger-is May 6, 2026 08:03 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is May 6, 2026 08:03 — with GitHub Actions Inactive
@pkosiec
Copy link
Copy Markdown
Member Author

pkosiec commented May 6, 2026

@renaudhartert-db @simonfaltum Please take a look 🙏 I applied all the suggestions after discussion with Simon (thanks for the feedback!). Now the manifest resides in the CLI repo.

@pkosiec pkosiec force-pushed the pkosiec/appkit-version-pinning branch from 4806f28 to 877ba5c Compare May 6, 2026 13:42
@pkosiec pkosiec temporarily deployed to test-trigger-is May 6, 2026 13:42 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is May 6, 2026 13:42 — with GitHub Actions Inactive
Copy link
Copy Markdown
Member

@simonfaltum simonfaltum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review summary

I ran a multi-agent review (Isaac + Cursor) on this PR. Both reviewers converged on REQUEST_CHANGES: the 4-tier fallback design is sound and tests are substantial, but the live remote manifest is treated as more trusted than its current validation justifies, and the embedded fallback is applied inconsistently across consumers.

Must-fix before merge:

  1. parseManifest doesn't validate appkit/skills values — empty/non-semver values fall through and git clone ends up using AppKit's default branch instead of a pinned tag.
  2. Embedded not-found fallback is missing from UpdateSkills, runManifestOnly, and defaultListSkills — inconsistent recovery across commands.
  3. IsNotFoundError uses fragile substring matching — replace with a sentinel error wrapped at the source.
  4. Resolve returns the lowest entry when CLI < min(keys) — needs a documented never-prune policy or fail-and-fallback to embedded.
  5. writeLocalManifest removes the cache before rename — failed rename leaves the user with no cache, defeating tier 3a.

Other findings posted inline below: 4xx retry suppression, missing User-Agent, debug-only cache write errors, naming/test cleanups, and questions about rollback story and offline first-run latency.

Some of these may overlap with the changes already requested by @renaudhartert-db; happy to defer where they do.

Comment thread libs/clicompat/clicompat.go
Comment thread experimental/aitools/lib/installer/update.go
Comment thread cmd/apps/manifest.go
Comment thread experimental/aitools/cmd/list.go
Comment on lines +143 to +149
func IsNotFoundError(err error) bool {
if err == nil {
return false
}
msg := strings.ToLower(err.Error())
return strings.Contains(msg, "not found") || strings.Contains(msg, "404")
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: IsNotFoundError is fragile substring matching across package boundaries.

The errors this inspects come from different packages (installer.GitHubManifestSource.FetchManifest, cmd/apps/init.go's awaitTemplate). Any string change in those callers silently breaks the fallback. It also matches false positives (e.g. "hostname not found", "exit code 404") and misses true positives like HTTP 410 Gone or git "reference does not exist" wording.

Suggested fix: define a sentinel var ErrNotFound = errors.New("compat target not found") in clicompat, and wrap returned errors at the source — installer/source.go already classifies HTTP 404 and could return fmt.Errorf("...: %w", clicompat.ErrNotFound). Then callers use errors.Is(err, clicompat.ErrNotFound). This decouples the fallback from error message wording.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced the strings.Contains(msg, "404") check with a typed HTTPStatusError (used by fetchRemote in clicompat and wrapped with ErrNotFound sentinel in installer/source.go for 404). Callers can now use errors.Is(err, clicompat.ErrNotFound).

Kept a narrow strings.Contains(msg, "not found") fallback for git clone errors — git.Clone produces untyped stderr text like "Remote branch X not found in upstream origin". Refactoring that to wrap a typed error is out of scope for this PR since it touches libs/git/ consumers beyond clicompat.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we can't avoid that string matching as the git lib (that uses raw Git client) doesn't use typed errors; even if we refactored that package, we'd stil produce typed errors based on string matching, so I don't think it's worth to do it 🤔 Or, alternatively, the whole library. could be refactored to go-git? 🤔


// httpClient is the HTTP client used for manifest fetches. Package-level var
// so tests can replace it.
var httpClient = &http.Client{Timeout: fetchTimeout}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: httpClient is a mutable package var swapped by tests (clicompat_test.go). Fine today since tests don't t.Parallel(), but the moment anyone tries to parallelize this package it'll race.

Lower-effort fix: leave it but add a comment that this package is intentionally not parallel-test-safe. Better: inject the *http.Client into a small struct or pass via context.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. Leaving the package-level var as-is since injecting via struct or context would be a larger refactor for minimal gain (tests are intentionally sequential).

Comment thread libs/clicompat/clicompat.go
Comment thread libs/clicompat/clicompat.go
Comment on lines +70 to +99
// 1. Local cached file (if fresh, < 1 hour old)
// 2. Remote fetch from GitHub (with retry)
// 3. Stale local file (if remote fails but a previously cached file exists)
// 4. Embedded manifest compiled into the binary
func FetchManifest(ctx context.Context) (Manifest, error) {
localPath := manifestLocalPath(ctx)

// Read local file once — reuse across tiers.
local, localErr := readLocalManifest(localPath)

// Tier 1: local file is fresh.
if localErr == nil && local.isFresh(cacheTTL) {
log.Debugf(ctx, "Using cached manifest from %s", localPath)
return local.manifest, nil
}

// Tier 2: fetch from remote (local file missing or stale).
m, fetchErr := fetchRemoteWithRetry(ctx)
if fetchErr == nil {
writeLocalManifest(ctx, localPath, m)
return m, nil
}

// Tier 3a: local file exists but stale — use it anyway.
if localErr == nil {
log.Debugf(ctx, "Using stale cached manifest (remote failed: %v)", fetchErr)
return local.manifest, nil
}

// Tier 3b: embedded manifest.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: rollback story for a bad cli-compat.json?

The cache TTL is 1 hour, so a bad manifest poisons all clients for up to an hour after a fix is merged (cache hit on stale-but-fresh cached bad manifest). Is this acceptable, or should there be a way for users to force-refresh? An env var (DATABRICKS_CLI_COMPAT_REFRESH=1) or --refresh-compat flag would give an escape hatch.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three mitigations exist: (1) stale cache is preferred over embedded on remote failure, (2) embedded manifest is always available as last resort, (3) 1h TTL bounds exposure.

Also added DATABRICKS_CACHE_ENABLED=false support in this PR (consistent with libs/cache), which bypasses the local cache entirely and forces a fresh remote fetch. This gives users an immediate escape hatch if they hit a bad cached manifest.

Comment on lines +218 to +230
func ResolveAppKitVersion(ctx context.Context) (string, error) {
entry, err := resolveEntry(ctx, build.GetInfo().Version)
if err != nil {
return "", err
}
return entry.AppKit, nil
}

// ResolveAgentSkillsVersion resolves the Agent Skills version for the current CLI.
func ResolveAgentSkillsVersion(ctx context.Context) (string, error) {
entry, err := resolveEntry(ctx, build.GetInfo().Version)
if err != nil {
return "", err
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: apps init flag help text uses embedded version, but runCreate calls the network-fetching ResolveAppKitVersion.

These can disagree (help text says default is X, but the actual default is Y). Probably fine, but worth confirming this is intentional and maybe a one-liner in the README to set the expectation.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Dropped the specific version from the help text — it now says "default: auto-detected" instead of showing the embedded version, since the actual default comes from the network-fetching ResolveAppKitVersion which may resolve a newer version.

@pkosiec pkosiec force-pushed the pkosiec/appkit-version-pinning branch from 877ba5c to 3fd14a4 Compare May 8, 2026 11:35
@pkosiec pkosiec temporarily deployed to test-trigger-is May 8, 2026 11:36 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is May 8, 2026 11:36 — with GitHub Actions Inactive
pkosiec added 7 commits May 8, 2026 13:37
- Embed cli-compat.json in the CLI binary with build-time fetch
- Resolve AppKit and Agent Skills versions from the manifest
- Add 1h local cache and retry for runtime manifest fetches
- Show default AppKit version in --version flag help
- Print resolved skills version during aitools install

Co-authored-by: Isaac
- Guard printVersionLine against empty latestRef to prevent confusing
  "Update available: v" output when skills version resolution fails
- Sort versionedKeys in TestEmbeddedManifest_IsWellFormed to prevent
  flaky test from map iteration randomness
- Preserve embedded manifest parse error in FetchManifest error message
- Add test for GetSkillsRef fallback to embedded manifest

Co-authored-by: Isaac
- Rename libs/depversions/ to libs/clicompat/ (package + files)
- Rename internal/build/dep_versions.go to clicompat.go
- Rename EmbeddedManifestJSON to CLICompatManifestJSON
- Extract devVersionPrefix const for "0.0.0-dev"
- Wrap both errors with %w in FetchManifest fallback
- Add template-v tag validation to bump-cli-compat skill
- Remove confusing positional args from skill (named flags only)
- Fix README: "After each AppKit or Agent Skills release"

Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
- Fix stale libs/depversions/ references in README (renamed to libs/clicompat/)
- Fix incorrect "next" key description (only used for dev builds, not newer-than-all)
- Fix import ordering (clicompat before cmdctx/cmdio alphabetically)
- Fix slices import grouping in clicompat_test.go (stdlib, not separate group)
- Fix error format: semicolon instead of period before hint text
- Fix FetchManifest godoc: "4-tier fallback" to match numbered list
- Fix writeLocalManifest comment: explain temp-file-then-rename pattern
- Fix README example values to match actual manifest
- Fix flaky test: pre-populate cache to avoid real network calls

Co-authored-by: Isaac
When the manifest resolves to a version that doesn't exist as a git tag
(404), retry with the version from the embedded manifest. Only triggers
on "not found" errors, not transient network failures.

Also:
- Rename EmbeddedDefaultAppKitVersion/EmbeddedResolve* to
  ResolveEmbeddedAppKitVersion/ResolveEmbeddedAgentSkillsVersion
- Remove duplicate log lines (keep only log.Warnf, drop cmdio.LogString)
- Drop ctx parameter from ResolveEmbedded* (not needed)

Signed-off-by: Pawel Kosiec <pawel.kosiec@databricks.com>
Address all review comments: add sentinel ErrNotFound and typed
HTTPStatusError, validate manifest entry values, centralize
not-found fallback for skills and AppKit consumers, skip retries
on 4xx, add User-Agent header, support DATABRICKS_CACHE_ENABLED,
and document pruning policy and trust model.

Co-authored-by: Isaac
@pkosiec pkosiec force-pushed the pkosiec/appkit-version-pinning branch from 3fd14a4 to 702d30e Compare May 8, 2026 11:37
@pkosiec pkosiec temporarily deployed to test-trigger-is May 8, 2026 11:38 — with GitHub Actions Inactive
@pkosiec pkosiec temporarily deployed to test-trigger-is May 8, 2026 11:38 — with GitHub Actions Inactive
@pkosiec pkosiec requested a review from simonfaltum May 8, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants