Skip to content

[elasticsearch] Add paimon-elasticsearch module with ES vector global index support#7777

Open
CrownChu wants to merge 2 commits intoapache:masterfrom
CrownChu:paimon-es
Open

[elasticsearch] Add paimon-elasticsearch module with ES vector global index support#7777
CrownChu wants to merge 2 commits intoapache:masterfrom
CrownChu:paimon-es

Conversation

@CrownChu
Copy link
Copy Markdown

@CrownChu CrownChu commented May 7, 2026

Summary

  • Add paimon-elasticsearch module implementing GlobalIndex SPI backed by Elasticsearch vector search
  • Support archive-based index packaging (tar.gz) for Paimon file system storage
  • Include Lucene directory adapter (ArchiveDirectory, ArchiveBackedIndexInput, ArchiveFlatVectorReader) for reading packed index segments
  • Add SLF4J bridge for ES internal logging
  • Add benchmarks (Sift1M, Lucene vs ES comparison) and integration tests

Key Components

Component Description
ESVectorGlobalIndexWriter Builds DiskBBQ vector index and packs into archive
ESVectorGlobalIndexReader Reads archived index, supports multi-stage retrieval (coarse + rescore)
ESVectorGlobalIndexer SPI entry point implementing GlobalIndexer
ESIndexArchiveUtils Pack/unpack Lucene index segments to/from tar.gz
ESVectorIndexOptions Configuration options (dimension, metric, ef, numCandidates, etc.)

Test Plan

  • Unit tests: ESVectorGlobalIndexTest covers write/read/search round-trip
  • Benchmarks: ESVectorBenchmark, Sift1MBenchmark, LuceneVsESBenchmark

root and others added 2 commits May 7, 2026 11:32
Add ESVectorGlobalIndexWriter/Reader/Indexer with DiskBBQ codec support,
archive-based index storage, and Lucene integration for vector search.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant