Skip to content

feat(text-embedding): add text embedding function with Ollama#32

Open
theothersideofgod wants to merge 2 commits into
mainfrom
lucas/text-embedding
Open

feat(text-embedding): add text embedding function with Ollama#32
theothersideofgod wants to merge 2 commits into
mainfrom
lucas/text-embedding

Conversation

@theothersideofgod
Copy link
Copy Markdown
Contributor

Summary

  • Add text-embedding function that generates embeddings using Ollama
  • Add Ollama K8s infrastructure (deployment, service, model pull job)
  • Add DRY_RUN mode for CI testing (no real Ollama needed)
  • Add unit tests and e2e tests

Details

Function

  • Uses @agentic-kit/ollama to call Ollama API
  • Default model: nomic-embed-text:latest
  • Supports custom model via param or EMBEDDING_MODEL env var
  • Returns 768-dimensional embedding vector

CI/Testing

  • TEXT_EMBEDDING_DRY_RUN=true returns mock embedding (same pattern as email functions)
  • Unit tests use Jest mock for @agentic-kit/ollama
  • E2E tests verify job queue → function → completion flow

K8s

  • ollama.yaml: Ollama server deployment + service
  • ollama-pull-job.yaml: Pre-pulls nomic-embed-text model on startup
  • Port-forward: localhost:11434

Test plan

  • Unit tests pass (pnpm test:unit)
  • Build passes (pnpm build)
  • E2E test passes locally with real Ollama
  • CI e2e test passes (DRY_RUN mode)

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 24, 2026 03:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new text-embedding function to the functions suite, backed by an Ollama deployment for local K8s dev, with dry-run support and accompanying unit/e2e tests.

Changes:

  • Introduces functions/text-embedding handler + unit tests using an @agentic-kit/ollama Jest mock.
  • Adds local-simple K8s resources for Ollama (Deployment/Service) and a model pre-pull Job, plus skaffold profile/port-forwarding.
  • Wires the new function into the job service registry/types and updates dependencies/Jest config.

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
tests/e2e/tests/text-embedding.e2e.test.ts Adds K8s e2e test that enqueues text-embedding jobs and waits for completion.
tests/mocks/@agentic-kit/ollama.ts Provides Jest mock for the Ollama client used by unit tests.
skaffold.yaml Adds a text-embedding profile, ollama port-forwarding, and updates local ports to avoid collisions.
pnpm-lock.yaml Locks new dependency @agentic-kit/ollama (and transitive deps) for the generated package.
k8s/overlays/local-simple/ollama.yaml Adds local-simple Ollama Deployment/Service.
k8s/overlays/local-simple/ollama-pull-job.yaml Adds a Job to pre-pull the embedding model into Ollama.
k8s/overlays/local-simple/kustomization.yaml Includes Ollama resources in the local-simple overlay.
k8s/overlays/local-simple/config.yaml Adds embedding-related env vars and TEXT_EMBEDDING_DRY_RUN.
job/service/src/types.ts Extends supported FunctionName union to include text-embedding.
job/service/src/index.ts Registers text-embedding module + default port in the function registry.
jest.config.ts Adds moduleNameMapper entry for @agentic-kit/ollama mock.
functions/text-embedding/handler.ts Implements embedding handler with DRY_RUN short-circuit and Ollama client call.
functions/text-embedding/handler.json Declares the new function (name/port/type/deps) for generation tooling.
functions/text-embedding/tests/handler.test.ts Adds unit tests covering validation, model selection, and DRY_RUN behavior.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +34 to +42
afterAll(async () => {
if (pg) await deleteTestJobs(pg, TEST_PREFIX);
await closeConnections();
});

it('should process a text-embedding job from the queue', async () => {
const job = await addJob(pg, databaseId, 'text-embedding', {
text: `${TEST_PREFIX} test text for embedding`,
});
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleteTestJobs(pg, TEST_PREFIX) won't remove the jobs created by this test because the inserted jobs use task_identifier 'text-embedding', while deleteTestJobs deletes by task_identifier LIKE '${prefix}%'. This leaves failed/queued rows behind across runs. Consider tracking created job IDs in the test and deleting by ID in afterAll (or extending deleteTestJobs to support deleting by a list of IDs).

Copilot uses AI. Check for mistakes.

const { text, model } = params;

if (!text || typeof text !== 'string') {
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text validation only rejects empty string, but allows whitespace-only values (e.g. ' '), since it doesn't trim. Other handlers treat whitespace-only strings as missing (e.g. isNonEmptyString(...).trim().length > 0). Consider using text.trim().length > 0 in the validation so the function doesn't generate embeddings for blank input.

Suggested change
if (!text || typeof text !== 'string') {
if (typeof text !== 'string' || text.trim().length === 0) {

Copilot uses AI. Check for mistakes.
spec:
containers:
- name: ollama
image: ollama/ollama:latest
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the ollama/ollama:latest image tag makes the local dev environment non-reproducible (a new upstream release can silently change behavior). Consider pinning to a specific version tag (or digest) and updating it intentionally when needed.

Suggested change
image: ollama/ollama:latest
image: ollama/ollama:0.3.14

Copilot uses AI. Check for mistakes.
Comment on lines +20 to +26
- |
echo "Pulling nomic-embed-text model..."
curl -X POST http://ollama:11434/api/pull \
-H "Content-Type: application/json" \
-d '{"name": "nomic-embed-text:latest"}' \
--max-time 600
echo "Model pull complete"
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull job script doesn't fail on HTTP errors: curl will exit 0 on non-2xx responses unless --fail/--fail-with-body is used, so the Job can report success even when the model pull fails. Consider adding set -e(o) pipefail and using curl --fail-with-body (and optionally checking/printing the response) so failures surface and the job retries properly.

Copilot uses AI. Check for mistakes.
@Anmol1696
Copy link
Copy Markdown
Contributor

Heads-up: repository history was rewritten on 2026-05-12 to scrub leaked secrets (Postgres/pgAdmin default passwords, an AWS access key ID, generated k8s/manifests/interweb-*.yaml files) from every commit. Every branch on origin was force-pushed with new commit SHAs.

This PR shows "DIRTY" / merge-conflict status because main has a cleanup commit and this branch is based on pre-rewrite main. To clear it:

git fetch --all --prune
git checkout <this-branch>
git reset --hard origin/<this-branch>    # your local branch tip moved too — pick up the rewritten version
git rebase origin/main                    # rebase onto rewritten main
# resolve conflicts (usually trivial — mostly secret-placeholder + the 4 deleted interweb-*.yaml files)
git push --force-with-lease

Or merge instead of rebase if the branch is shared with others:

git merge origin/main
git push

Notes:

  • The secrets that leaked were all either dead (rotated AWS key) or were defaults that have since been rotated/replaced; no active credential is exposed.
  • Old commit SHAs are still accessible by direct URL on GitHub for ~90 days (can be expedited via GitHub Support if needed).
  • See k8s/SECRET-EXPOSURE-AUDIT.md on main for the full incident audit.

Extracted from lucas/text-embedding branch and adapted to new
dynamic registry architecture. Changes:

- Add functions/text-embedding with handler + tests
- Add Ollama k8s deployment and model pull job
- Update k8s config with TEXT_EMBEDDING_DRY_RUN and OLLAMA env vars
- Fix test to use inline jest.mock() for @agentic-kit/ollama

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@socket-security
Copy link
Copy Markdown

socket-security Bot commented May 16, 2026

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addednpm/​@​agentic-kit/​ollama@​1.2.07410010089100
Addednpm/​twilio@​5.13.17610010095100

View full report

theothersideofgod added a commit that referenced this pull request May 16, 2026
Built on top of text-embedding (PR #32). Changes:

- Add functions/rag-embedding with chunking + embedding pipeline
- Add packages/text-chunker for text chunking utilities
- Add RAG_EMBEDDING_DRY_RUN env var to k8s config
- Add e2e test for rag-embedding

Supports fixed, sentence, paragraph, and semantic chunking strategies.
Uses Ollama nomic-embed-text for embeddings and stores results via
PostGraphile mutations.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants