Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 30 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,36 @@ All notable changes to A3S Box will be documented in this file.

## [Unreleased]

## [2.6.0] — 2026-06-26

### Added

- **`containerd-shim-a3s-box-v2` — Kubernetes RuntimeClass integration.** A new
containerd runtime-v2 shim (standalone `containerd-shim/` crate) that lets a vanilla
Kubernetes cluster route `runtimeClassName: a3s-box` pods to the a3s-box MicroVM
runtime via a containerd runtime handler, without replacing the node CRI. It maps the
containerd Task API onto the `a3s-box` CLI (pod sandbox → placeholder; workload →
detached MicroVM; `kubectl exec` → `a3s-box exec`). Deploy manifests under
`deploy/shim/` (RuntimeClass, additive containerd config, example pod). Validated on a
real `/dev/kvm` Kubernetes node: a `runtimeClassName: a3s-box` pod reaches Running on
a real libkrun MicroVM. Still experimental — `kubectl exec`/log streaming depend on the
guest exec control channel and are not yet fully validated; single-container,
TSI-networked pods are the supported shape.

### Fixed

- **VMM shim now survives teardown of its launcher's session.** `VmController` puts the
libkrun shim in its own session (`setsid` via `pre_exec`) so a process-group/cgroup
kill of a foreground launcher (e.g. a containerd-shim `a3s-box run`) no longer reaps
the shim and removes the box's `exec.sock`, which previously caused `a3s-box exec` to
fail with "exec socket missing".

### Changed

- **`a3s-libkrun-sys` build downloads are resilient.** The libkrunfw fetch now retries and
aborts stalled transfers (`curl --retry --speed-limit/--speed-time`) instead of a bare,
unbounded `curl` that could hang forever on a flaky network.

## [2.5.2] — 2026-06-22

### Changed
Expand Down
75 changes: 74 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ As of **v2.4.0**, three adversarial audits — production-operability (24 findin
| Networking | Default TSI networking, TCP `host:guest` publishing, user-defined bridge networks, network inspect/connect/disconnect/rm, and `/etc/hosts` peer discovery are implemented with documented platform boundaries. |
| Compose | A useful local subset is implemented: image, command, entrypoint, env, env_file, ports, volumes, depends_on, networks, DNS, tmpfs, workdir, hostname, extra_hosts, labels, healthcheck, restart, CPU/memory, capabilities, and privileged mode. |
| TEE | AMD SEV-SNP-oriented attestation, RA-TLS, sealing, and secret injection flows exist, plus simulation mode for development. Hardware-backed operation depends on SEV-SNP-capable hosts and libkrun support. TDX is not a productized path. |
| Kubernetes CRI | Reachable by `crictl`/kubelet over its Unix socket. Verified on a `/dev/kvm` host: pod + container lifecycle (`RunPodSandbox` → `CreateContainer` → `StartContainer` → `Stop`/`Remove`), `exec` over Kubernetes SPDY/3.1 `remotecommand` (TTY and non-TTY, stdin/stdout/stderr, exit codes), and container log capture to `log_path`. Not yet conformant: `attach` and the stricter `critest` specs (log format, Linux SecurityContext, seccomp/AppArmor, namespaces, mount propagation). Linux-only; not the core completion target. |
| Kubernetes CRI | Reachable by `crictl`/kubelet over its Unix socket. Verified on a `/dev/kvm` host: pod + container lifecycle (`RunPodSandbox` → `CreateContainer` → `StartContainer` → `Stop`/`Remove`), `exec` over Kubernetes SPDY/3.1 `remotecommand` (TTY and non-TTY, stdin/stdout/stderr, exit codes), and container log capture to `log_path`. Not yet conformant: `attach` and the stricter `critest` specs (log format, Linux SecurityContext, seccomp/AppArmor, namespaces, mount propagation). Linux-only; not the core completion target. **RuntimeClass:** a one-command per-node installer (`deploy/scripts/install-runtimeclass.sh`) registers the `io.containerd.a3s-box.v2` runtime, and `runtimeClassName: a3s-box` is validated end-to-end (pod start + `kubectl exec`) across a 5-node cluster — see [Deploy as a Kubernetes RuntimeClass](#deploy-as-a-kubernetes-runtimeclass). |
| Windows | Native WHPX backend through libkrun. The Windows package runs directly on Windows with Windows Hypervisor Platform enabled; it does not require WSL. Windows CRI is intentionally out of scope. |

## What A3S Box is
Expand Down Expand Up @@ -347,6 +347,79 @@ helm install a3s-box deploy/helm/a3s-box/ -n a3s-box-system --create-namespace

Windows CRI is intentionally unsupported.

## Deploy as a Kubernetes RuntimeClass

Run selected pods as a3s-box MicroVMs by setting `runtimeClassName: a3s-box`. Each
pod's containers become libkrun MicroVMs under containerd, and `kubectl exec` works
against them. This is opt-in per node — a node must have the runtime installed **and**
carry the label `a3s-box.io/runtime=true` before a3s-box pods schedule there.

**1. Create the RuntimeClass (once per cluster):**

```bash
kubectl apply -f - <<'EOF'
apiVersion: node.k8s.io/v1
kind: RuntimeClass
metadata:
name: a3s-box
handler: a3s-box
scheduling:
nodeSelector:
a3s-box.io/runtime: "true" # only labeled nodes run a3s-box pods
EOF
```

**2. Provision each node** — run the installer as root on every node that should
host a3s-box workloads. It installs the a3s-box CLI + helpers, libkrun, and the
containerd runtime-v2 shim (`containerd-shim-a3s-box-v2`), registers the
`io.containerd.a3s-box.v2` runtime via an `/etc/containerd/conf.d` drop-in, restarts
containerd, and warms the one-time per-node boot cache. Requires containerd ≥ 2.0
and `/dev/kvm`.

```bash
# one-liner (downloads the pinned release artifacts from GitHub):
curl -fsSL https://raw.githubusercontent.com/AI45Lab/Box/main/deploy/scripts/install-runtimeclass.sh | sudo bash

# or from a checkout:
sudo deploy/scripts/install-runtimeclass.sh # default version
sudo deploy/scripts/install-runtimeclass.sh --version v2.6.0 # pin a version
```

Then label the node from a machine with `kubectl`:

```bash
kubectl label node <node-name> a3s-box.io/runtime=true
```

Notes:
- **Control-plane nodes** carry a `NoSchedule` taint and are normally excluded —
leave them unlabeled unless you intentionally run workloads there.
- The installer warms up with `busybox:latest` so the *first* pod boots fast (the
first box on a fresh node builds a one-time cache that can exceed the shim's boot
window). Use `--warmup-image <ref>` to point at a mirror, or `--no-warmup` to skip.
- **Air-gapped:** pre-stage the release tarball
(`a3s-box-<ver>-linux-<arch>.tar.gz`) and `containerd-shim-a3s-box-v2-linux-<arch>`
in a directory and pass `--from-dir <dir>` (no network needed).

**3. Run a pod:**

```bash
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
name: hello-a3s-box
spec:
runtimeClassName: a3s-box
containers:
- name: app
image: busybox:latest
command: ["sleep", "3600"]
EOF

kubectl exec hello-a3s-box -- sh -c 'echo "hello from $(hostname)"; uname -m'
```

## Architecture

```text
Expand Down
28 changes: 28 additions & 0 deletions containerd-shim/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Standalone crate: NOT part of the box workspace (own [workspace] table below).
# It links no a3s-box crate — it drives the installed `a3s-box` CLI — so it builds
# fast and in isolation, and never perturbs the runtime's Cargo.lock.
[package]
name = "a3s-box-containerd-shim"
version = "0.1.0"
edition = "2021"
license = "MIT"
description = "containerd runtime shim v2 that routes RuntimeClass=a3s-box pods to the a3s-box MicroVM runtime"

[[bin]]
name = "containerd-shim-a3s-box-v2"
path = "src/main.rs"

[dependencies]
# ttrpc + protobuf come transitively via containerd-shim-protos; reference them
# through its re-exports (containerd_shim_protos::{ttrpc, protobuf}) so versions
# always match the generated Task service.
containerd-shim = { version = "0.11", features = ["async"] }
containerd-shim-protos = { version = "0.11", features = ["async"] }
async-trait = "0.1"
tokio = { version = "1", features = ["rt-multi-thread", "macros", "process", "sync", "io-util", "fs", "time"] }
log = "0.4"
libc = "0.2"
serde = { version = "1", features = ["derive"] }
serde_json = "1"

[workspace]
Loading
Loading