feat(text-embedding): add text embedding function with Ollama#32
feat(text-embedding): add text embedding function with Ollama#32theothersideofgod wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new text-embedding function to the functions suite, backed by an Ollama deployment for local K8s dev, with dry-run support and accompanying unit/e2e tests.
Changes:
- Introduces
functions/text-embeddinghandler + unit tests using an@agentic-kit/ollamaJest mock. - Adds local-simple K8s resources for Ollama (Deployment/Service) and a model pre-pull Job, plus skaffold profile/port-forwarding.
- Wires the new function into the job service registry/types and updates dependencies/Jest config.
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/e2e/tests/text-embedding.e2e.test.ts | Adds K8s e2e test that enqueues text-embedding jobs and waits for completion. |
| tests/mocks/@agentic-kit/ollama.ts | Provides Jest mock for the Ollama client used by unit tests. |
| skaffold.yaml | Adds a text-embedding profile, ollama port-forwarding, and updates local ports to avoid collisions. |
| pnpm-lock.yaml | Locks new dependency @agentic-kit/ollama (and transitive deps) for the generated package. |
| k8s/overlays/local-simple/ollama.yaml | Adds local-simple Ollama Deployment/Service. |
| k8s/overlays/local-simple/ollama-pull-job.yaml | Adds a Job to pre-pull the embedding model into Ollama. |
| k8s/overlays/local-simple/kustomization.yaml | Includes Ollama resources in the local-simple overlay. |
| k8s/overlays/local-simple/config.yaml | Adds embedding-related env vars and TEXT_EMBEDDING_DRY_RUN. |
| job/service/src/types.ts | Extends supported FunctionName union to include text-embedding. |
| job/service/src/index.ts | Registers text-embedding module + default port in the function registry. |
| jest.config.ts | Adds moduleNameMapper entry for @agentic-kit/ollama mock. |
| functions/text-embedding/handler.ts | Implements embedding handler with DRY_RUN short-circuit and Ollama client call. |
| functions/text-embedding/handler.json | Declares the new function (name/port/type/deps) for generation tooling. |
| functions/text-embedding/tests/handler.test.ts | Adds unit tests covering validation, model selection, and DRY_RUN behavior. |
Files not reviewed (1)
- pnpm-lock.yaml: Language not supported
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| afterAll(async () => { | ||
| if (pg) await deleteTestJobs(pg, TEST_PREFIX); | ||
| await closeConnections(); | ||
| }); | ||
|
|
||
| it('should process a text-embedding job from the queue', async () => { | ||
| const job = await addJob(pg, databaseId, 'text-embedding', { | ||
| text: `${TEST_PREFIX} test text for embedding`, | ||
| }); |
There was a problem hiding this comment.
deleteTestJobs(pg, TEST_PREFIX) won't remove the jobs created by this test because the inserted jobs use task_identifier 'text-embedding', while deleteTestJobs deletes by task_identifier LIKE '${prefix}%'. This leaves failed/queued rows behind across runs. Consider tracking created job IDs in the test and deleting by ID in afterAll (or extending deleteTestJobs to support deleting by a list of IDs).
|
|
||
| const { text, model } = params; | ||
|
|
||
| if (!text || typeof text !== 'string') { |
There was a problem hiding this comment.
The text validation only rejects empty string, but allows whitespace-only values (e.g. ' '), since it doesn't trim. Other handlers treat whitespace-only strings as missing (e.g. isNonEmptyString(...).trim().length > 0). Consider using text.trim().length > 0 in the validation so the function doesn't generate embeddings for blank input.
| if (!text || typeof text !== 'string') { | |
| if (typeof text !== 'string' || text.trim().length === 0) { |
| spec: | ||
| containers: | ||
| - name: ollama | ||
| image: ollama/ollama:latest |
There was a problem hiding this comment.
Using the ollama/ollama:latest image tag makes the local dev environment non-reproducible (a new upstream release can silently change behavior). Consider pinning to a specific version tag (or digest) and updating it intentionally when needed.
| image: ollama/ollama:latest | |
| image: ollama/ollama:0.3.14 |
| - | | ||
| echo "Pulling nomic-embed-text model..." | ||
| curl -X POST http://ollama:11434/api/pull \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"name": "nomic-embed-text:latest"}' \ | ||
| --max-time 600 | ||
| echo "Model pull complete" |
There was a problem hiding this comment.
The pull job script doesn't fail on HTTP errors: curl will exit 0 on non-2xx responses unless --fail/--fail-with-body is used, so the Job can report success even when the model pull fails. Consider adding set -e(o) pipefail and using curl --fail-with-body (and optionally checking/printing the response) so failures surface and the job retries properly.
b8ce5df to
bb76cce
Compare
bb76cce to
2dbeda7
Compare
|
Heads-up: repository history was rewritten on 2026-05-12 to scrub leaked secrets (Postgres/pgAdmin default passwords, an AWS access key ID, generated This PR shows "DIRTY" / merge-conflict status because git fetch --all --prune
git checkout <this-branch>
git reset --hard origin/<this-branch> # your local branch tip moved too — pick up the rewritten version
git rebase origin/main # rebase onto rewritten main
# resolve conflicts (usually trivial — mostly secret-placeholder + the 4 deleted interweb-*.yaml files)
git push --force-with-leaseOr merge instead of rebase if the branch is shared with others: git merge origin/main
git pushNotes:
|
Extracted from lucas/text-embedding branch and adapted to new dynamic registry architecture. Changes: - Add functions/text-embedding with handler + tests - Add Ollama k8s deployment and model pull job - Update k8s config with TEXT_EMBEDDING_DRY_RUN and OLLAMA env vars - Fix test to use inline jest.mock() for @agentic-kit/ollama Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
2dbeda7 to
25ca320
Compare
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub.
|
Built on top of text-embedding (PR #32). Changes: - Add functions/rag-embedding with chunking + embedding pipeline - Add packages/text-chunker for text chunking utilities - Add RAG_EMBEDDING_DRY_RUN env var to k8s config - Add e2e test for rag-embedding Supports fixed, sentence, paragraph, and semantic chunking strategies. Uses Ollama nomic-embed-text for embeddings and stores results via PostGraphile mutations. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
text-embeddingfunction that generates embeddings using OllamaDetails
Function
@agentic-kit/ollamato call Ollama APInomic-embed-text:latestEMBEDDING_MODELenv varCI/Testing
TEXT_EMBEDDING_DRY_RUN=truereturns mock embedding (same pattern as email functions)@agentic-kit/ollamaK8s
ollama.yaml: Ollama server deployment + serviceollama-pull-job.yaml: Pre-pulls nomic-embed-text model on startuplocalhost:11434Test plan
pnpm test:unit)pnpm build)🤖 Generated with Claude Code