Skip to content

feat(indexer): vendor go-openaudio/pkg/etl as the block indexer#840

Open
raymondjacobson wants to merge 1 commit into
mainfrom
api/vendor-etl
Open

feat(indexer): vendor go-openaudio/pkg/etl as the block indexer#840
raymondjacobson wants to merge 1 commit into
mainfrom
api/vendor-etl

Conversation

@raymondjacobson
Copy link
Copy Markdown
Member

Summary

Replaces the in-tree `CoreIndexer` block-fetching loop (which only handled `CreateUser`) with the full ETL indexer from `github.com/OpenAudio/go-openaudio/pkg/etl@v1.3.0`.

ETL gives us the full 31-entity-type handler suite (users, tracks, playlists, follows, saves, reposts, comments, events, grants, developer apps, tip reactions, associated wallets, etc.) plus the scheduled-release publisher. The package is kept in sync with upstream via tagged releases — release-please on go-openaudio is configured for lockstep, so future bumps come as a one-line `go.mod` change.

Changes

File What
`go.mod` / `go.sum` Add `github.com/OpenAudio/go-openaudio/pkg/etl v1.3.0`; bump parent module from `v1.2.13` → `v1.3.0`
`indexer/indexer.go` Rewritten. `CoreIndexer.Start` runs `etl.Indexer.Run()` alongside the existing `AggregatesCalculator` via errgroup. The previous block-fetching loop (`run`, `attemptProcessNextBlock`, `handleBlock`, `handleManageEntity`) is gone.
`indexer/index_user.go` + test Deleted. The only operation the old handler implemented was `CreateUser`, which ETL now handles along with the other 30 entity types.
`indexer/constants.go` Kept — the `Action_` constants are still used by `api/api/v1_.go` handlers when building outgoing ManageEntity write transactions (not part of the indexing path).
`api/health_check.go` Switched the indexer-lag query from `indexing_checkpoints.last_checkpoint` (the old in-tree tracker) to `MAX(height) FROM core_indexed_blocks` (ETL's per-block tracker). Same semantic, different table.

ETL configuration choices

  • `SkipMigrations: false` (default). Migrations are idempotent against api/'s schema — every ETL migration uses `CREATE TABLE IF NOT EXISTS` / `ADD COLUMN IF NOT EXISTS`. Verified by applying all 21 current ETL migrations on top of a fresh DB seeded with api/'s schema: zero errors, only NOTICE messages for already-existing relations. ETL tracks its own migration state in `etl_db_migrations` separate from api/'s `schema_version`, so no state-table collision.
  • `DisableMaterializedViewRefresh()`: refreshes `mv_dashboard_*` views that don't exist in api/'s schema.
  • `DisablePgNotifyListener()`: publishes block/play events to a channel api/ has no consumer for.
  • `ScheduledReleasePublisher` stays enabled — it's the same job apps' Python `publish_scheduled_releases` celery task did and we want it running here.

Caveat

`etl.Indexer.Run()` uses its own internal `context.Background()` rather than honoring `api/`'s shutdown ctx — graceful shutdown via ctx cancellation isn't supported by the upstream API today. Process termination (SIGTERM) still works via Go's normal exit path, and DB connections drain via pool finalizers on process exit. Acceptable tradeoff to avoid forking ETL; can be patched upstream later if it matters.

Concurrency hazard during cutover

If the legacy Python discovery-provider is still running against the same DB when this deploys, two of ETL's tables will see racy writes from both writers and could end up double-counted:

  • `hourly_play_counts` (Python's index_hourly_play_counts)
  • `user_listening_history` (Python's index_user_listening_history)

These both use checkpoint-cursored additive upserts. Coordinate the cutover: stop the Python jobs before deploying api/ with this PR.

Trending, scheduled-release publish, prune, delist statuses, and the other ETL/jobs flows are idempotent and safe to run alongside Python during the transition window.

Stacking with #834

PR #834 (parity jobs) adds `startParityJobs(ctx)` to `CoreIndexer.Start`. This PR rewrites that file. Whichever lands second needs a one-line rebase to slot `ci.startParityJobs(ctx)` back into the errgroup section. Both PRs are small enough that order doesn't matter much.

Verified

  • `go build ./...` clean
  • `go vet ./indexer/ ./api/` clean
  • ETL migrations apply cleanly on top of api/'s schema (proven in this thread)
  • External `go get github.com/OpenAudio/go-openaudio/pkg/etl@v1.3.0` works from a brand-new module (verified in /tmp test consumer)

🤖 Generated with Claude Code

Replaces the in-tree CoreIndexer block-fetching loop (which only handled
CreateUser) with the full ETL indexer from
github.com/OpenAudio/go-openaudio/pkg/etl@v1.3.0. ETL gives us the 31-entity-
type handler suite plus scheduled-release publishing, kept in sync with
upstream via tagged releases (lockstep with go-openaudio root per upstream
release-please config).

Changes:

- go.mod: add github.com/OpenAudio/go-openaudio/pkg/etl v1.3.0 and bump
  parent github.com/OpenAudio/go-openaudio to v1.3.0.
- indexer/indexer.go: rewritten. CoreIndexer.Start runs etl.Indexer.Run()
  alongside the existing AggregatesCalculator via errgroup. The previous
  block-fetching loop (run / attemptProcessNextBlock / handleBlock /
  handleManageEntity) is gone.
- indexer/index_user.go + index_user_test.go: deleted. The only operation
  the old handler implemented was CreateUser, which ETL now handles along
  with the other 30 entity types.
- indexer/constants.go: kept — the Action_* constants are still used by
  api/api/v1_*.go handlers when building outgoing ManageEntity write
  transactions (not part of the indexing path).
- api/health_check.go: switched the indexer-lag query from
  indexing_checkpoints.last_checkpoint (the old in-tree tracker) to
  MAX(height) FROM core_indexed_blocks (ETL's per-block tracker). Same
  semantic, different table.

ETL config: SkipMigrations is left false. Migrations are idempotent against
api/'s schema (verified by applying all 21 current ETL migrations on top of
a fresh DB seeded with api/'s schema: zero errors, only NOTICE messages for
already-existing relations). ETL tracks its own state in etl_db_migrations
separate from api/'s schema_version, so there's no collision.

Two ETL components are explicitly disabled when embedded here:

- MaterializedViewRefresh: refreshes mv_dashboard_* views that don't exist
  in api/'s schema.
- PgNotifyListener: publishes block/play events to a channel api/ has no
  consumer for.

ScheduledReleasePublisher stays enabled — it covers the
publish_scheduled_releases celery task gap.

Caveat: etl.Indexer.Run() uses its own internal context.Background()
rather than honoring api/'s shutdown ctx — graceful shutdown via ctx
cancellation isn't supported by the upstream API today. Process termination
(SIGTERM) still works via Go's normal exit path. Acceptable tradeoff to
avoid forking ETL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant