Support Kubernetes-native Service/Endpoint exposure for sandbox agents

### Problem Statement

Problem

The openshell CLI provides options to expose sandbox agent endpoints, but these are designed for interactive/short-lived sessions. For long-running agents deployed as always-on services (e.g., LangChain agents serving OpenAI-compatible APIs), the CLI-based port exposure is unreliable — connections drop, ports are not discoverable via DNS, and there's no integration with Kubernetes Service objects.

Current Behavior

openshell sandbox expose creates a temporary port forward
No Kubernetes Service or Endpoints object is created
Clients cannot discover the agent via standard <svc>.<ns>.svc.cluster.local DNS
If the gateway pod restarts, the exposure is lost
No integration with Istio VirtualService for external ingress

### Proposed Design

Desired Behavior
When a sandbox is created with a declared port (e.g., port: 8000), OpenShell should:

1. Create a Kubernetes Service and Endpoints (or EndpointSlice) pointing to the sandbox pod's IP and declared port
2. The Service should be stable across pod restarts (gateway recreates the pod, Service stays)
3. Optionally support type: ClusterIP for internal access and integration with Ingress/Istio for external access
4. The Sandbox CRD could accept a service stanza:
```
apiVersion: agents.x-k8s.io/v1alpha1
kind: Sandbox
metadata:
  name: my-agent
spec:
  podTemplate:
    spec:
      containers:
      - name: agent
        ports:
        - containerPort: 8000
  service:
    enabled: true
    port: 8000
    type: ClusterIP
```


### Alternatives Considered

Use Case
Enterprise deployment of AI agents as long-running microservices that need to be accessible by other services in the cluster, load balancers, and API gateways — not as interactive CLI sessions.

Workaround
Our controller patches labels onto the sandbox pod (agentplatform.hpe.com/agent: <name>) and creates a selector-based ClusterIP Service in the same namespace. This works for in-namespace routing, but the Service is not managed by OpenShell itself — our external controller must discover the sandbox pod, patch labels, and reconcile the Service independently. If the gateway recreates the pod (e.g., after eviction), the new pod has no labels until our controller re-reconciles, causing a brief service blackout.

### Agent Investigation

_No response_

### Checklist

- [x] I've reviewed existing issues and the architecture docs
- [x] This is a design proposal, not a "please build this" request

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Kubernetes-native Service/Endpoint exposure for sandbox agents #1791

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Support Kubernetes-native Service/Endpoint exposure for sandbox agents #1791

Description

Problem Statement

Proposed Design

Alternatives Considered

Agent Investigation

Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions