deploys-app · acoshift · Jun 18, 2026 · Jun 18, 2026 · Jun 18, 2026
diff --git a/content/deployments/overview.md b/content/deployments/overview.md
@@ -68,6 +68,33 @@ configured replica range. Metrics flow into the dashboard and usage to billing.
 If readiness never passes, the rollout fails and the previous revision keeps
 serving — you don't get a broken deployment because of a bad image.
 
+## Automatic error cleanup
+
+A deployment that *should* keep pods running but has **no ready pod for 15
+minutes** is automatically marked **error** and its workload is torn down. This
+catches a deployment that applied cleanly but then can't stay up — a
+crash-looping image, an image that never pulls, or a readiness probe that never
+passes — so a dead deployment doesn't sit consuming a slot indefinitely.
+
+It only ever acts on a deployment that is *supposed* to have a running pod, so it
+leaves these alone:
+
+- **Scheduled jobs (CronJob)** — they have no standing pods between runs.
+- **Paused** deployments, in-flight rollouts, and freshly-deployed revisions
+  (which get a grace period to pull the image and start up).
+
+To recover, fix the image or configuration and deploy again — the deployment is
+recreated from its spec. While it's torn down its URL stops serving, so a
+redeploy is what brings it back.
+
+{{< callout type="note" >}}
+A failed *rollout* is different: if a new revision can't become ready, the
+**previous revision keeps serving** and nothing is torn down. Cleanup only fires
+when there is no ready pod at all for the full grace window — and it backs off
+during a cluster-wide incident, so a bad node pool or a registry outage doesn't
+mass-error your deployments.
+{{< /callout >}}
+
 ## How to drive it
 
 Anything you can do from this page, you can do from the [CLI](/automation/cli/)

diff --git a/content/networking/domains.md b/content/networking/domains.md
@@ -64,6 +64,29 @@ Wildcard domains require **DNS-01** verification, so you'll also need to add a
 verify with HTTP-01 and don't need that record.
 {{< /callout >}}
 
+## When a certificate can't be issued
+
+A verified domain normally gets its TLS certificate within a minute or two. Once
+in a while issuance keeps failing — Let's Encrypt is rate-limiting the account, a
+`CAA` record blocks Let's Encrypt, or (for wildcards) the `_acme-challenge` CNAME
+isn't in place — and the certificate stays **issuing** without ever completing.
+
+If a certificate stays unissued for **more than 24 hours**, the platform reclaims
+it: the stale request is removed and the domain flips to **error**. This stops a
+permanently-failing request from burning Let's Encrypt quota and surfaces the
+problem instead of leaving the domain silently without HTTPS.
+
+To recover, fix the underlying cause — clear the `CAA` restriction, add the
+`_acme-challenge` CNAME the console shows, or wait out a Let's Encrypt
+rate-limit — and the platform re-requests the certificate automatically. The
+domain returns to **active** once the certificate issues.
+
+{{< callout type="note" >}}
+After a reclaim the platform keeps retrying about once a day, so a domain that
+becomes issuable later (a rate-limit clears, a missing record is added) recovers
+on its own — you don't need to re-create it.
+{{< /callout >}}
+
 ## Routing traffic
 
 Creating a domain alone doesn't send any traffic to a deployment — you still