Skip to content

Enable embeddings for sandboxed AI workloads #1771

@shiju-nv

Description

@shiju-nv

Problem Statement

Workloads running inside the sandbox can't generate embeddings; the numeric representations of text behind semantic search, retrieval-augmented generation (RAG), and similarity matching. Chat, completion, and model-listing calls already work, but embedding requests are rejected before they ever reach a provider. As a result, any feature that "searches by meaning" rather than exact keywords silently fails for anything running in the sandbox.

Proposed Design

Treat embeddings as a first-class request type in the local inference proxy, the same way chat and completions already are: recognize an embeddings request, route it to the configured AI provider with the right credentials, and return the result. Because an embeddings result is one complete response (not a streamed feed of tokens), serve it in a single piece so it can't be corrupted by being cut short mid-response.

Alternatives Considered

  • Reuse the existing streaming path for embeddings: rejected, that path is built for incremental token streams and can corrupt a single all-at-once response if it's truncated.
  • Leave embeddings unsupported in the sandbox and require an external endpoint: rejected, defeats the purpose of sandboxed AI workloads and splits configuration across two places.

Agent Investigation

No response

Checklist

  • I've reviewed existing issues and the architecture docs
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Labels

    gator:validatedGator validated this issue as ready for workstate:triage-neededOpened without agent diagnostics and needs triage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions