Problem Statement
GET /v1/models (model discovery) returns a single JSON model list. The sandbox inference proxy routes it through the Server-Sent Events streaming path. On a streaming size-cap or idle-timeout truncation, that path appends an SSE error frame to the body, which corrupts a payload the client parses as one JSON object.
Proposed Design
Make response framing a property of the inference protocol. Add a ResponseFraming field to InferenceApiPattern, set once per pattern in default_patterns. model_discovery and openai_embeddings are Buffered; the SSE protocols (chat completions, completions, responses, Anthropic messages) stay Streaming.
Alternatives Considered
Inspect the request stream flag to choose framing per request. Deferred. It would also let non-streaming chat and completion responses be served buffered, but it is a larger change.
Agent Investigation
No response
Checklist
Problem Statement
GET /v1/models (model discovery) returns a single JSON model list. The sandbox inference proxy routes it through the Server-Sent Events streaming path. On a streaming size-cap or idle-timeout truncation, that path appends an SSE error frame to the body, which corrupts a payload the client parses as one JSON object.
Proposed Design
Make response framing a property of the inference protocol. Add a ResponseFraming field to InferenceApiPattern, set once per pattern in default_patterns. model_discovery and openai_embeddings are Buffered; the SSE protocols (chat completions, completions, responses, Anthropic messages) stay Streaming.
Alternatives Considered
Inspect the request stream flag to choose framing per request. Deferred. It would also let non-streaming chat and completion responses be served buffered, but it is a larger change.
Agent Investigation
No response
Checklist