Skip to content

Extract preview/sync GitHub Actions#4897

Open
backspace wants to merge 7 commits into
mainfrom
cs-11180-extract-shared-preview-realm-github-actions-to-monorepo
Open

Extract preview/sync GitHub Actions#4897
backspace wants to merge 7 commits into
mainfrom
cs-11180-extract-shared-preview-realm-github-actions-to-monorepo

Conversation

@backspace
Copy link
Copy Markdown
Contributor

@backspace backspace commented May 19, 2026

I noticed that boxel-home PR previews are broken:

image

This is because the interface to _publish-realm changed:

Publishing https://realms-staging.stack.cards/boxel_homepage_realm/boxel-home-pr-57/ to https://boxel_homepage_realm.staging.boxel.dev/boxel-home-pr-57/
Failed to publish realm (HTTP 202):
{
  "data": {
    "type": "published_realm",
    "id": "23ea3f2a-9c7a-4028-ad3a-7be647ed476b",
    "attributes": {
      "sourceRealmURL": "https://realms-staging.stack.cards/boxel_homepage_realm/boxel-home-pr-57/",
      "publishedRealmURL": "https://boxel_homepage_realm.staging.boxel.dev/boxel-home-pr-57/",
      "lastPublishedAt": "1778870465062",
      "status": "pending"
    }
  }
}

HTTP 202 is actually expected now!

I also noticed that boxel-catalog, boxel-home, and boxel-skills were all using duplicative bespoke workflows to accomplish similar tasks, with use of cardstack/boxel-cli, npm Boxel CLI, and the old workspace sync CLI.

This extracts the preview/sync workflows into the monorepo so they can be used from external repositories and tested in-monorepo in case of interface changes like the above. You can see them tested and passing in this job.

@backspace backspace force-pushed the cs-11180-extract-shared-preview-realm-github-actions-to-monorepo branch from 037a389 to 79657eb Compare May 20, 2026 00:00
@backspace backspace changed the base branch from cs-11161-extract-workspace-sync-action to main May 20, 2026 00:01
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Observability diff (vs staging)

Show diff
diff --git a/tmp/remote-canon.Nq1dRP/dashboards/boxel-status/indexing.json b/tmp/committed-canon.XjD11i/dashboards/boxel-status/indexing.json
index a39cf75..25280b9 100644
--- a/tmp/remote-canon.Nq1dRP/dashboards/boxel-status/indexing.json
+++ b/tmp/committed-canon.XjD11i/dashboards/boxel-status/indexing.json
@@ -69,6 +69,10 @@
           "uid": "cef5v5sl9k7i8f"
         },
         "description": "System-wide operator action: queue a full reindex across every realm. The button disables itself while a `full-reindex` orchestration job is already pending or running. Per-realm reindex moved to the Realms dashboard. Click POSTs with `Authorization: Bearer ${grafana_secret}` (substituted from SSM at apply time, CS-10929).",
+        "fieldConfig": {
+          "defaults": {},
+          "overrides": []
+        },
         "gridPos": {
           "h": 8,
           "w": 24,

(Run: https://github.com/cardstack/boxel/actions/runs/26161560752)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Grafana preview

Preview deployed for 1 dashboard in the staging Grafana.
Cross-dashboard drill-throughs still point at the canonical staging dashboards.

Dashboards:

Preview is torn down automatically when this PR is closed or merged.

(Run: https://github.com/cardstack/boxel/actions/runs/26161560825)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 20, 2026

Preview deployments

Host Test Results

    1 files      1 suites   1h 33m 6s ⏱️
2 712 tests 2 697 ✅ 15 💤 0 ❌
2 731 runs  2 716 ✅ 15 💤 0 ❌

Results for commit 880fdff.

Realm Server Test Results

    1 files  ±0      1 suites  ±0   10m 23s ⏱️ -18s
1 480 tests ±0  1 480 ✅ ±0  0 💤 ±0  0 ❌ ±0 
1 571 runs  ±0  1 571 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 880fdff. ± Comparison against earlier commit 57d3fe8.

@backspace backspace changed the title feat: shared preview-realm GitHub Actions (split off from #4851) Extract preview/sync GitHub Actions May 20, 2026
@backspace backspace changed the base branch from main to cs-11161-extract-workspace-sync-action May 20, 2026 00:27
@backspace backspace force-pushed the cs-11180-extract-shared-preview-realm-github-actions-to-monorepo branch 2 times, most recently from d7095f0 to 434ac24 Compare May 20, 2026 12:09
@backspace backspace changed the base branch from cs-11161-extract-workspace-sync-action to main May 20, 2026 12:09
backspace added a commit that referenced this pull request May 20, 2026
Node's fetch always reports `TypeError: fetch failed` as `error.message`;
the actual transport reason (ECONNRESET, TLS handshake error, undici
socket error, ENOTFOUND, GOAWAY, etc.) is stashed on `error.cause` and
was being silently dropped by the publish/unpublish error paths. That
left the action-demo workflow showing a bare "Error: fetch failed" with
no way to distinguish a real network issue from, say, a self-signed
cert problem against the published-realm subdomain.

Wrap the three swallowed sites:

- `publish.ts` `.action()` catch: log `err.cause` separately if present.
- `publish.ts` `waitForPublishedRealmReady`: capture cause into the
  `lastError` string so the readiness-timeout error reports the same
  thing the polling loop kept hitting.
- `unpublish.ts` `unpublishRealm`: embed cause into the `result.error`
  string the CLI surfaces.

This is the diagnostic the action-demo on #4897 needs to figure out
why publish hangs at the initial POST despite the server-side mount
completing successfully.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
backspace added a commit that referenced this pull request May 20, 2026
The worker's `fatalExit` handler already exists (uncaughtException /
unhandledRejection backstop with a finalize-reservation race) — but
it reports the error via `log.error(...)` immediately before
`process.exit(1)`. `worker-manager.ts` spawns the child with
`stdio: ['pipe', 'pipe', 'pipe', 'ipc']`, so the child's stderr is a
libuv-async pipe; the final stream chunk gets discarded when the
process disappears, and the captured server log shows the child as
having silently exited `code=1, signal=null` with no clue why.

worker.ts already uses `writeSync(2, ...)` for exactly this reason
on the STARTUP / SIGINT / SIGTERM / disconnect stamps (see the
comment above the STARTUP block at the top of the file). Apply the
same pattern to the three fatal-exit paths: the uncaughtException /
unhandledRejection handler, its inner finalize-failed fallback, and
the outer startup-error `.catch`. Route each through a new helper
that serializes the error with its full stack and walks `error.cause`
(where Node fetch / undici / TLS errors stash the real reason).

Discovered while debugging the action-demo on #4897 (CS-11180): every
`_publish-realm` of a fresh source realm enqueues a copy-index job
that throws *something* inside the worker; the worker exited
silently; pg-queue retried, hit the 2-reservation cap, abandoned the
job; the realm-server returned HTTP 500
`Job abandoned after 2 failed attempts (max=2)` to the publish
endpoint caller. Without this fix the underlying job-processing
error is unobservable.

The bundled `serialize-fatal-reason` helper is in its own module
because the FD-level write behavior can't be unit-tested in-process
(it requires a real child_process.spawn + libuv-piped stderr to
reproduce the bug being fixed) — but the serialization can. Tests
cover: stack preservation, cause-chain walking, non-Error values,
self-referential cause cycles (depth-capped), and Node fetch's
typical `TypeError: fetch failed` + ECONNRESET-on-cause shape.

Closes CS-11200.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@backspace backspace marked this pull request as draft May 20, 2026 19:46
backspace added a commit that referenced this pull request May 21, 2026
Node's fetch always reports `TypeError: fetch failed` as `error.message`;
the actual transport reason (ECONNRESET, TLS handshake error, undici
socket error, ENOTFOUND, GOAWAY, etc.) is stashed on `error.cause` and
was being silently dropped by the publish/unpublish error paths. That
left the action-demo workflow showing a bare "Error: fetch failed" with
no way to distinguish a real network issue from, say, a self-signed
cert problem against the published-realm subdomain.

Wrap the three swallowed sites:

- `publish.ts` `.action()` catch: log `err.cause` separately if present.
- `publish.ts` `waitForPublishedRealmReady`: capture cause into the
  `lastError` string so the readiness-timeout error reports the same
  thing the polling loop kept hitting.
- `unpublish.ts` `unpublishRealm`: embed cause into the `result.error`
  string the CLI surfaces.

This is the diagnostic the action-demo on #4897 needs to figure out
why publish hangs at the initial POST despite the server-side mount
completing successfully.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@backspace backspace force-pushed the cs-11180-extract-shared-preview-realm-github-actions-to-monorepo branch from f8a1399 to 608717a Compare May 21, 2026 19:05
backspace and others added 2 commits May 21, 2026 14:08
Extract the publish-preview-realm / unpublish-preview-realm /
workspace-sync composite actions so `boxel-catalog`, `boxel-home`,
`boxel-skills` (and any future consumer) can stop maintaining
duplicated bespoke preview-realm workflows.

This branch is layered on top of cs-11161 (#4851) so the bundled
demo workflow can exercise `boxel realm publish` / `unpublish` /
`push` end-to-end against the CLI commits in this branch's
ancestry. Once #4851 lands, GitHub will auto-rebase this PR's base
onto main and the diff will stay clean against main.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Used while iterating on the three composite actions; not part of the
shipped product. External consumers (boxel-catalog, boxel-home,
boxel-skills) exercise the actions in their own preview workflows.
@backspace backspace force-pushed the cs-11180-extract-shared-preview-realm-github-actions-to-monorepo branch from 608717a to cb1d9db Compare May 21, 2026 19:08
backspace and others added 2 commits May 21, 2026 14:33
Adds preview-realm-actions-integration.yml — runs the three composite
actions (publish, workspace-sync, unpublish) end-to-end against the
same local matrix + realm-server stack `boxel-cli-test` boots, so
contract drift between the actions, the boxel-cli commands they wrap,
and the realm-server handlers they POST to is caught the moment any
side changes.

Path-gated triggers (on `pull_request` and `push` to main, plus
`workflow_dispatch` for manual) so only PRs touching the integration
surface pay the runtime cost. The set covers each action.yml, this
workflow, the publish/unpublish/push CLI commands, the
handle-publish-realm / handle-unpublish-realm server handlers, and
the copy-index task that the publish handler enqueues.

Uses path-relative `uses: ./.github/actions/...` so the actions run
at the PR's own commit. External consumers (boxel-catalog, -home,
-skills) pin a SHA instead.

Also re-applies the in-repo `mise` short-circuit in each action: when
`github.action_repository == github.repository` (i.e., invoked from
inside cardstack/boxel itself), set BOXEL_SRC to $GITHUB_WORKSPACE
and skip the clone + mise/pnpm install steps because the calling
workflow's ./.github/actions/init already did them. Without this the
inner `jdx/mise-action` re-hashes a separate cache key whose lookup
sits ~30 minutes before transfer. External consumers continue to go
through the full clone + install path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The in-repo short-circuit compared `github.action_repository` against
`github.repository`, but `github.action_repository` is only populated
for *external* `uses: org/repo/...@ref` references. For path-relative
`uses: ./.github/...` (which is exactly how
preview-realm-actions-integration.yml invokes these actions), the
value is empty, so the predicate `"" = "cardstack/boxel"` was false
and the action fell into the external-consumer branch and tried to
`git clone https://github.com/.git/`, failing with `remote: Not Found`.

Treat empty BOXEL_REPO as in-repo too. External consumers still hit
the populated-and-different branch and run the full clone + install.
@backspace backspace marked this pull request as ready for review May 21, 2026 22:10
@backspace backspace requested a review from a team May 22, 2026 01:11
@habdelra habdelra requested a review from Copilot May 22, 2026 02:08
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR moves the PR preview realm publish/sync/unpublish automation into reusable composite GitHub Actions inside the boxel monorepo, and adds an end-to-end integration workflow that exercises those actions against the local test stack to detect contract drift (e.g., the realm publish endpoint now returning HTTP 202 as expected).

Changes:

  • Adds three composite actions: publish-preview-realm, workspace-sync, and unpublish-preview-realm, implemented on top of in-tree @cardstack/boxel-cli.
  • Adds a path-gated integration workflow that boots the local stack and runs publish → sync → unpublish to validate the full surface area.
  • Updates the publish action behavior to accept/poll the newer “202 + pending” publish contract via boxel realm publish --timeout.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 13 comments.

File Description
.github/workflows/preview-realm-actions-integration.yml Adds an E2E workflow that validates the three composite actions against the local realm-server + Matrix stack.
.github/actions/publish-preview-realm/action.yml Introduces a composite action to create/push/publish a preview realm and wait for readiness via boxel-cli.
.github/actions/workspace-sync/action.yml Introduces a composite action to push a local directory into an existing Boxel workspace via boxel realm push.
.github/actions/unpublish-preview-realm/action.yml Introduces a composite action to unpublish a previously published preview realm (tolerating missing).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread .github/actions/workspace-sync/action.yml Outdated
Comment thread .github/actions/publish-preview-realm/action.yml Outdated
Comment thread .github/actions/unpublish-preview-realm/action.yml Outdated
Comment thread .github/actions/workspace-sync/action.yml Outdated
Comment thread .github/actions/workspace-sync/action.yml Outdated
Comment thread .github/actions/unpublish-preview-realm/action.yml Outdated
Comment thread .github/actions/workspace-sync/action.yml Outdated
Comment thread .github/actions/publish-preview-realm/action.yml Outdated
Comment thread .github/actions/unpublish-preview-realm/action.yml Outdated
realm-server-url: https://localhost:4201/

- name: Print server logs
if: ${{ !cancelled() }}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that’s not true! failure in an earlier step won’t cause this to not run, because it’s looking explicitly for cancellation

@backspace
Copy link
Copy Markdown
Contributor Author

in the end these are mostly wrapping Boxel CLI, I don’t think they have much value on their own, so I’m closing this, but I’ll still update the other repositories to use minimal CLI patterns

@backspace backspace closed this May 22, 2026
@backspace backspace reopened this May 22, 2026
backspace added 3 commits May 22, 2026 13:16
Consumer repos (boxel-home, boxel-skills, boxel-catalog) duplicate
~250 lines of nearly-identical sync logic across their own workflow
files. Composite actions could share that logic but force each
consumer to pin a SHA - so a change to the realm-server's
_publish-realm contract doesn't surface in consumer CI until they
manually bump.

Reusable workflows give the same de-duplication AND let consumers
track @main, so any contract change at the CLI<->server boundary
auto-propagates on their next run. They also remove the
clone-the-monorepo bootstrap the composite actions needed: each
workflow `npm install`s @cardstack/boxel-cli@latest, which the
caller can pin via the `boxel-cli-version` input.

  - .github/workflows/sync-workspace.yml - reusable workflow for the
    push-to-staging-on-main / dry-run-on-PR / push-to-production-on-
    release pattern. Supports both sticky-PR-comment and artifact
    reporting (the latter for boxel-catalog).
  - .github/workflows/preview-realm.yml - reusable workflow for the
    create+push+publish lifecycle on PR open/sync and the unpublish
    cleanup on PR close (currently only boxel-home; generic enough
    for any per-PR preview consumer).
  - .github/workflows/preview-realm-actions-integration.yml -
    rewritten to exercise the underlying `boxel realm
    create/push/publish/unpublish` commands directly against a local
    matrix + realm-server stack. The reusable workflows are thin
    shells around these CLI invocations, so contract drift between
    the CLI and the server's handlers surfaces here at PR time.
    Path-gated to fire on changes to the reusable workflows, the
    relevant CLI commands, or the server-side handlers.

Drops the three composite actions
(.github/actions/{publish,unpublish}-preview-realm, workspace-sync)
along with their ~500 lines of monorepo-bootstrap scaffolding.
`with:` blocks disallow the `secrets` context, so consumers whose
matrix username lives in `secrets.*` (boxel-skills) couldn't pass it
to the reusable workflow as an input. Declaring it as a secret lets
the caller source the value from either `vars.*` or `secrets.*` —
both contexts are allowed inside `secrets:` blocks.
Addresses Copilot review feedback on PR #4897. boxel-cli reads
BOXEL_PASSWORD from the environment when --password is not supplied
(packages/boxel-cli/src/commands/profile.ts:124), and explicitly
warns in build-program.ts that env-var auth is preferred over the
flag because flags leak via /proc/*/cmdline and ps output.

Drop --password from the five `boxel profile add` invocations and
rename the env var from MATRIX_PASSWORD to BOXEL_PASSWORD so the CLI
picks it up implicitly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants