Skip to content

Operations

This area collects operational runbooks that cut across bounded contexts — deployment shapes and procedures an operator owns at the level of the whole platform rather than a single surface. The context-scoped, task-oriented runbooks live under the how-to guides; the pages here describe the platform-wide topology those tasks plug into.

Runbooks

  • Operating plexsphere across multiple regions — pin tenants to regional management clusters, schedule and migrate placements with the region-assignment resolver, publish per-region ingress hostnames, and lay down the region DNS records.
  • Disaster recovery — back up and restore the control plane after a disaster: the coverage matrix, recovery objectives, the ordered restore checklist, seal-key custody, and regional failover as promotion.
  • Failure modes and degradation — how a replica degrades when a single dependency (SpiceDB, Postgres, NATS JetStream, OpenBao, the Signing Service) is briefly unavailable: the boot-gate-then-degrade readiness model and per-dependency detection, degraded behaviour, and recovery checks with the real probe names, metrics, problem codes, and tuning knobs.
  • Capacity and scale targets — the six per-Domain scale dimensions and their Phase-1 targets, how the collector samples used/target ratios, the 80% crossing audit contract, the capacity_exceeded refusal, the sample-interval knob, and the make load-test harness that validates one dimension at a time.

When to read this section

You are standing up or evolving the deployment topology of a plexsphere installation as a whole. For a task scoped to one surface, start with the how-to guides; for the exact contract of a surface, see the reference.