feat(indexer): vendor go-openaudio/pkg/etl as the block indexer#840
Open
raymondjacobson wants to merge 1 commit into
Open
feat(indexer): vendor go-openaudio/pkg/etl as the block indexer#840raymondjacobson wants to merge 1 commit into
raymondjacobson wants to merge 1 commit into
Conversation
Replaces the in-tree CoreIndexer block-fetching loop (which only handled CreateUser) with the full ETL indexer from github.com/OpenAudio/go-openaudio/pkg/etl@v1.3.0. ETL gives us the 31-entity- type handler suite plus scheduled-release publishing, kept in sync with upstream via tagged releases (lockstep with go-openaudio root per upstream release-please config). Changes: - go.mod: add github.com/OpenAudio/go-openaudio/pkg/etl v1.3.0 and bump parent github.com/OpenAudio/go-openaudio to v1.3.0. - indexer/indexer.go: rewritten. CoreIndexer.Start runs etl.Indexer.Run() alongside the existing AggregatesCalculator via errgroup. The previous block-fetching loop (run / attemptProcessNextBlock / handleBlock / handleManageEntity) is gone. - indexer/index_user.go + index_user_test.go: deleted. The only operation the old handler implemented was CreateUser, which ETL now handles along with the other 30 entity types. - indexer/constants.go: kept — the Action_* constants are still used by api/api/v1_*.go handlers when building outgoing ManageEntity write transactions (not part of the indexing path). - api/health_check.go: switched the indexer-lag query from indexing_checkpoints.last_checkpoint (the old in-tree tracker) to MAX(height) FROM core_indexed_blocks (ETL's per-block tracker). Same semantic, different table. ETL config: SkipMigrations is left false. Migrations are idempotent against api/'s schema (verified by applying all 21 current ETL migrations on top of a fresh DB seeded with api/'s schema: zero errors, only NOTICE messages for already-existing relations). ETL tracks its own state in etl_db_migrations separate from api/'s schema_version, so there's no collision. Two ETL components are explicitly disabled when embedded here: - MaterializedViewRefresh: refreshes mv_dashboard_* views that don't exist in api/'s schema. - PgNotifyListener: publishes block/play events to a channel api/ has no consumer for. ScheduledReleasePublisher stays enabled — it covers the publish_scheduled_releases celery task gap. Caveat: etl.Indexer.Run() uses its own internal context.Background() rather than honoring api/'s shutdown ctx — graceful shutdown via ctx cancellation isn't supported by the upstream API today. Process termination (SIGTERM) still works via Go's normal exit path. Acceptable tradeoff to avoid forking ETL.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces the in-tree `CoreIndexer` block-fetching loop (which only handled `CreateUser`) with the full ETL indexer from `github.com/OpenAudio/go-openaudio/pkg/etl@v1.3.0`.
ETL gives us the full 31-entity-type handler suite (users, tracks, playlists, follows, saves, reposts, comments, events, grants, developer apps, tip reactions, associated wallets, etc.) plus the scheduled-release publisher. The package is kept in sync with upstream via tagged releases — release-please on go-openaudio is configured for lockstep, so future bumps come as a one-line `go.mod` change.
Changes
ETL configuration choices
Caveat
`etl.Indexer.Run()` uses its own internal `context.Background()` rather than honoring `api/`'s shutdown ctx — graceful shutdown via ctx cancellation isn't supported by the upstream API today. Process termination (SIGTERM) still works via Go's normal exit path, and DB connections drain via pool finalizers on process exit. Acceptable tradeoff to avoid forking ETL; can be patched upstream later if it matters.
Concurrency hazard during cutover
If the legacy Python discovery-provider is still running against the same DB when this deploys, two of ETL's tables will see racy writes from both writers and could end up double-counted:
These both use checkpoint-cursored additive upserts. Coordinate the cutover: stop the Python jobs before deploying api/ with this PR.
Trending, scheduled-release publish, prune, delist statuses, and the other ETL/jobs flows are idempotent and safe to run alongside Python during the transition window.
Stacking with #834
PR #834 (parity jobs) adds `startParityJobs(ctx)` to `CoreIndexer.Start`. This PR rewrites that file. Whichever lands second needs a one-line rebase to slot `ci.startParityJobs(ctx)` back into the errgroup section. Both PRs are small enough that order doesn't matter much.
Verified
🤖 Generated with Claude Code