Skip to content

Management Fleet

Authoritative bounded-context reference for the Management Fleet (internal/provisioning/managementfleet/). It owns the durable record of which Kubernetes clusters host the provisioning substrate (Crossplane v2 + the External Secrets Operator) and which cluster owns each Project, and reconciles a dedicated RBAC-scoped namespace per assigned Project.

The context has no HTTP surface: callers reach it through the in-process Service facade, and the reconcile loop drives the per-Project namespace off live cluster state. The closed port set keeps the domain and application layers free of pgx and controller-runtime (ports.go). This milestone registers clusters, places Projects, reconciles namespaces, and verifies a cluster carries an Available substrate before assignment — installing the substrate stays with the deploy/ Helm chart.

This reference is a single page: the surface is narrow (two aggregates, one pure state machine, five ports, seven sentinels) and the pieces travel in lockstep. The Recovery runbook at the foot is the operator-facing companion.

Cross-references

  • ../../contributing/layout.md — the bounded-context map row that locates internal/provisioning/managementfleet inside the codebase and enumerates the depguard rules that confine pgx to repo/, controller-runtime to reconcile/, and bar cross-context imports.
  • ./credentials.md and ./credential-pool.md — the sibling provisioning contexts. The management fleet mirrors their module shape (a domain package, a repo/ adapter, a port set, an events subpackage) but owns inventory and reconcile rather than secret material.
  • ../identity/tenancy.md — the Domain → Project → Resource → Node aggregate model. A Project is owned by the tenancy context; the management fleet observes it only as the key of a ProjectClusterAssignment and as the seed of the per-Project namespace name.
  • ../../../internal/provisioning/managementfleet/doc.go — the package-level pin of the ubiquitous language and the reconcile-model DECISION block (why a boot-probe-plus-ticker closure rather than a watch-driven controller-runtime Manager).
  • ../../../internal/platform/db/migrations/0023_management_fleet.sql — the persistence schema for plexsphere.management_clusters and plexsphere.project_cluster_assignments.
  • ../../../cmd/plexsphere/management_fleet_factory_prod.go — the production composition root: the env loader, the in-cluster client, the reconcile sweep, and the boot-probe / steady-state-ticker pair.
  • ../../../tests/e2e/provisioning/management-fleet/chainsaw-test.yaml — the Chainsaw e2e suite that stands up a kind cluster with digest-pinned Crossplane v2 and the External Secrets Operator, runs the reconcile, and asserts the per-Project Namespace, Role, RoleBinding, ServiceAccount, ResourceQuota, and a passing verify gate.

Ubiquitous language

The terms below travel together across the Go code, the SQL migration, the domain-event payloads, the structured-log attributes, and the recovery runbook. Names are preserved verbatim so a reader chasing a string from a log line finds it in the source without translation.

TermDefinitionCode anchor
ManagementClusterThe aggregate root modelling one registered Kubernetes cluster in the fleet. Carries (id, name, slug, region, kubeconfigSecretRef, status, createdAt, updatedAt). The application mints a UUIDv7 id at registration; the slug is the unique human-facing key the assignment model shards on. The fields are unexported and reached through accessors so the creation invariants hold only through the two constructors.cluster.go
ClusterIDThe UUIDv7 identity of a ManagementCluster, a named wrapper over uuid.UUID. The String() projection is the canonical hyphenated lowercase form. The zero value is rejected by every creation invariant.cluster.go
ProjectClusterAssignmentThe aggregate root keyed by Project id, recording which management cluster owns a Project, the region the placement landed in, the per-Project namespace name, and that namespace's lifecycle phase. The (projectID, managementClusterID) pairing is immutable once the Project has provisioned resources.assignment.go
ProjectIDThe UUID identity of a Project as the fleet observes it — a named wrapper over uuid.UUID, distinct at compile time from ClusterID. It is the assignment key and the seed of the namespace name.namespace.go
per-Project namespaceThe Kubernetes Namespace carved out for one Project on its management cluster. Its name is plexsphere-project-<uuid> — the fixed prefix followed by the canonical hyphenated lowercase Project UUID. The name is a deterministic function of the Project id (a 55-character RFC 1123 label) so the persisted column and the live cluster name can never drift.namespace.go
NamespacePhaseThe closed-set lifecycle of the per-Project namespace: Pending, Provisioning, Ready, Degraded, Terminating, Deleted. The phase is advanced exclusively by the pure transition function over observed cluster facts, never from process memory.phase.go
ObservedClusterThe snapshot of per-Project facts the reconcile loop reads live from a management cluster on a single tick: whether the Namespace, Role, RoleBinding, ServiceAccount, and ResourceQuota exist, and whether the verify gate passed. Every field is a fact the cluster reports, not a desired state.phase.go
ActionThe reconcile instruction the transition function emits for one tick: Noop, Converge, or Delete.phase.go
ServiceThe in-process application-service facade. Methods: RegisterCluster, AssignProjectToCluster, LookupAssignment, SelectClusterForRegion, MarkTerminating, Unassign. It orchestrates the Repository and owns the transaction boundary; the aggregates enforce their own creation invariants.service.go
ReconcilerThe controller-runtime adapter that reconciles exactly one ProjectClusterAssignment per call: observe cluster facts, decide via the pure transition, apply the action, persist the phase, emit the lifecycle event.reconcile/reconcile.go
verify gateThe read-only readiness check that confirms a management cluster serves the apiextensions.crossplane.io, pkg.crossplane.io, and external-secrets.io API groups and that the Crossplane and External Secrets Operator controller Deployments report Available.reconcile/verify.go

Aggregates

The context owns two aggregate roots. They are modified one per transaction; the application service coordinates the cross-aggregate read (the resource count) inside the same RunInTx as the assignment write so the immutability invariant holds against one consistent snapshot.

ManagementCluster — invariants

NewManagementCluster and HydrateManagementCluster are the only paths that produce a valid aggregate. Both enforce the same creation invariants, so a row that drifts from them in storage is rejected at the hydration boundary rather than surfacing later in the reconcile loop.

InvariantLayerFailure mode
id is non-zero.Aggregate constructor.ErrInvalidInput — a zero ClusterID would collide with the "not yet assigned" sentinel.
slug is non-empty.Aggregate constructor; SQL UNIQUE constraint management_clusters_slug_unique.ErrInvalidInput on an empty slug; a duplicate slug surfaces from the repository adapter.
createdAt and updatedAt are non-zero, with createdAt ≤ updatedAt.Aggregate constructor.ErrInvalidInput — an unordered or zero timestamp pair is a corrupt row.

ProjectClusterAssignment — invariants

InvariantLayerFailure mode
projectID and managementClusterID are both non-zero; timestamps are non-zero and ordered.Aggregate constructor.ErrInvalidInput.
namespaceName equals NamespaceName(projectID).Aggregate constructor — the field is derived, never caller-supplied, and re-derived on hydration.A persisted name that drifts from the derivation cannot reach the reconcile loop: hydration recomputes it.
project_id is the PRIMARY KEY — a Project is assigned to at most one cluster at a time.SQL PRIMARY KEY.A second CreateAssignment for the same Project surfaces a constraint violation from the repository adapter.
The (projectID, managementClusterID) pairing is immutable once the Project owns at least one resource.Application service — CountResourcesForProject inside the placement RunInTx.ErrAssignmentImmutable — re-pointing would orphan the Composite Resources, ProviderConfigs, and Secrets in the old namespace.
namespace_phase is one of the six closed-set values.SQL CHECK constraint project_cluster_assignments_phase_check; application validation.The CHECK is defence-in-depth: an out-of-set phase string cannot reach a persisted row even on an application bug.
Neither a Project nor a management cluster can be deleted while a live assignment references it.SQL FOREIGN KEY … ON DELETE RESTRICT on both project_id and management_cluster_id.The delete fails closed at the database — a CASCADE would silently orphan a live namespace with no inventory record that it needs tearing down.

Namespace-phase state machine

The per-Project namespace walks a six-phase lifecycle. The phase is advanced only by the pure, total transition function Next (phase.go), which reads no clock, no process memory, and no I/O: it derives the Action and the next NamespacePhase solely from the live ObservedCluster snapshot and the assignment's current phase. The same inputs always yield the same output, so the machine is fully unit-testable without a cluster.

The machine has two arms, selected by teardown intent:

text
  Converge arm — current phase ∈ {Pending, Provisioning, Ready, Degraded}

    ┌──────────┐   resources missing OR verify failing
    │ Pending  │ ──────────────────────────────────────┐
    └────┬─────┘                                        │
         │                                              ▼
         │                                   ┌────────────────────┐
         │            all resources present  │   Provisioning     │
         │            AND verify passed       │  (Converge)        │
         ▼                                    └─────────┬──────────┘
  ┌──────────────┐ ◀───────────────────────────────────┘
  │    Ready     │  all resources present AND verify passed
  │  (Converge,  │
  │ then Noop)   │ ──┐ a required resource went missing
  └──────┬───────┘   │  OR verify started failing
         │           ▼
         │   ┌────────────────────┐
         │   │     Degraded       │  keeps converging to repair
         │   │    (Converge)      │
         │   └────────────────────┘

         │  Service.MarkTerminating  (teardown requested)

  Teardown arm — current phase ∈ {Terminating, Deleted}; sticky

  ┌────────────────────┐  resources still remain
  │   Terminating      │ ──────────────────────────┐
  │    (Delete)        │ ◀──────────────────────────┘
  └─────────┬──────────┘
            │  no per-Project resources remain on the cluster

  ┌────────────────────┐
  │     Deleted        │  terminal — (Noop)
  └────────────────────┘

Key properties the transition guarantees:

  • Teardown is sticky. Once the phase is Terminating or Deleted the namespace never converges again; it can only drain to Deleted.
  • Degraded is distinct from Provisioning. A namespace that reached Ready and then lost a resource is marked Degraded, not Provisioning, so an operator can tell a first install apart from a regression. Both keep emitting Converge.
  • Ready is event-worthy once. A namespace arriving at Ready from a lower phase emits Converge so the caller records the crossing; on the next tick it settles to Noop.
  • The function is total. Every (ObservedCluster, NamespacePhase) pair maps to a defined result; an unrecognised phase string is treated as Pending, the safe converge-from-scratch entry point.

What the reconcile loop does with the result

Reconciler.Reconcile (reconcile/reconcile.go) runs one assignment per call in four steps:

  1. ObserveGet the five per-Project resources from the cluster and run the verify gate, building an ObservedCluster. A NotFound marks a resource absent; any other API error is an infrastructure failure that propagates. An ErrClusterUnhealthy from the verify gate collapses into VerifyPassed = false — the namespace keeps converging — while a non-unhealthy gate error propagates.
  2. Decide — call the pure Next transition.
  3. ApplyConverge creates or repairs the Namespace, then the RBAC objects, then the ResourceQuota, in that order; Delete tears the namespace down; Noop does nothing. Every step is idempotent.
  4. Persist + emit — when the phase changed, write it through the Repository and, on a crossing into Ready or Deleted, publish the matching lifecycle event.

Re-running Reconcile against an already-converged or already-torn-down cluster is success.

Region-assignment resolver

While Reconcile converges one assignment's namespace against live cluster facts, a RegionAssignmentResolver decides where a Project is placed in the first place and whether an existing placement still matches its tenant's region. It runs each management-fleet sweep and has two halves — scheduling and migration — both governed by one exact-match contract: a placement is correct iff ManagementCluster.Region() == Domain.Region(). The match is exact string equality; there is no nearest-region fallback and no normalisation beyond the kebab-case validation ParseRegion already applied at construction.

A guiding rule frames both halves: a per-Project condition — no cluster for the region, a blocked migration — is a skip-and-WARN, never a sweep failure. Only an infrastructure error (a repository fault, an apiserver outage) fails a sweep. So a single misplaceable Project never stalls the placement of every other Project in the fleet.

Scheduling — Schedule()

Schedule() places each schedulable Project. A Project is schedulable when it owns at least one resource and has no assignment yet.

For each schedulable Project, the resolver looks for a registered management cluster whose region exactly matches the Project's Domain region:

  • A cluster matches. The resolver creates a Pending assignment on it and emits a ProjectClusterAssigned event.
  • No cluster matches. The resolver logs a WARN and skips the Project — no assignment row is created. The Project stays unplaced until a cluster for its region is registered; the next sweep retries. (An unpinned Domain — empty region — is placed under the same free-placement path that predates region pinning.)

Migration — MigrateOutOfRegion()

Re-pinning a Domain's region (a PATCH, since the region is mutable unlike the immutable slug) re-targets every Project beneath it. The Project region is derived from the Domain, not stored on the Project, so one Domain re-pin can leave many assignments pointing at a cluster whose region no longer matches. MigrateOutOfRegion() reconciles each such assignment one step per sweep, and the path forks on whether the Project owns resources.

Zero-resource Project — migrated automatically. The resolver drives the move itself, advancing one step per sweep:

  1. It advances the current assignment's namespace phase into the teardown arm — from Pending, Provisioning, Ready, or Degraded to Terminating.
  2. While the namespace is Terminating, it waits; the reconcile loop drains the per-Project namespace, RBAC, and quota.
  3. Once the old assignment reaches Deleted, it re-points the Project to a fresh Pending assignment on a correct-region cluster, emitting ProjectClusterAssigned.

Because the state machine advances by a single step each sweep, a migration spans several sweeps; this is expected.

Resource-owning Project — blocked. When the Project owns resources, the (projectID, managementClusterID) pairing is immutable: the resolver does not move it. It logs a WARN, leaves the assignment untouched, and the sweep still returns success. Re-pointing a Project with live resources would orphan its Composite Resources, ProviderConfigs, and Secrets in the old region's namespace — the same fail-closed invariant ErrAssignmentImmutable guards in the application service. To migrate such a Project deliberately, drain its resources on the old cluster first (drive the namespace through TerminatingDeleted and Unassign), then let the next sweep place it in the new region.

Operators inspect the resulting placement — the owning cluster, the region the assignment landed in, and the namespace phase — through GET /v1/projects/{project_id}/management-cluster-assignment (see the Management Fleet HTTP API).

Per-Project resources

For every assignment the converge step provisions five objects on the management cluster. All five carry the Kubernetes recommended labels app.kubernetes.io/managed-by=plexsphere, app.kubernetes.io/part-of=management-fleet, app.kubernetes.io/component=<role>, and app.kubernetes.io/instance=<namespace name>, so one app.kubernetes.io/instance selector scopes to a single Project's full resource set (reconcile/labels.go).

When the assignment is region-pinned, the same five objects additionally carry the well-known Kubernetes topology label topology.kubernetes.io/region=<region>, set to the assignment's region. This label is distinct from the app.kubernetes.io/* recommended set: it locates a Project's resources in a placement region rather than describing the application. It is omitted for a region-less (unpinned) Domain, so a label selector on topology.kubernetes.io/region matches only region-pinned Projects.

ResourceNamePurpose
Namespaceplexsphere-project-<uuid>The RBAC and quota boundary for the Project's resources. Carries a kubernetes.io/description annotation.
Roleplexsphere-projectA namespace-scoped, least-privilege Role. Its single rule grants get, list, watch on configmaps, secrets, events, serviceaccounts in the core API group — no create/update/delete, no cluster-admin, and never the * wildcard in apiGroups, resources, or verbs.
RoleBindingplexsphere-projectBinds the Role to the per-Project ServiceAccount.
ServiceAccountplexsphere-projectThe dedicated identity for the Project's in-namespace provisioning workload.
ResourceQuotaplexsphere-project-quotaCaps the Project's footprint on the shared cluster — object counts (ConfigMaps, Secrets, Pods, Services) and compute (CPU/memory requests and limits) — so one Project cannot starve another.

The RBAC grant is deliberately read-only at this milestone: no requirement asks the in-namespace workload to write, and an explicit, narrow grant keeps the bootstrap RBAC at the minimum the reconcile loop can audit. A future task that needs write verbs widens projectRoleRules deliberately and records the justification in the DECISION block on reconcile/rbac.go.

Ports

The context reaches every collaborator through one of five ports declared in framework-free terms — context, time, and the package's own aggregates — so the domain layer stays free of pgx, controller-runtime, and k8s.io (ports.go). The composition root wires concrete adapters; tests inject in-memory fakes.

PortMethodsAdapterTest seam
RepositoryCreateCluster, GetCluster, ListClusters, CreateAssignment, GetAssignment, ListAssignmentsForCluster, UpdateAssignmentPhase, DeleteAssignment, CountResourcesForProject, RunInTxPostgres in repo/managementfleet_pg.go, wrapping the sqlc-generated queries. Constraint-name dispatch maps SQLSTATE collisions and FK violations onto the sentinels. depguard confines pgx to this subpackage.In-memory fakes in the unit tests; the repository adapter has its own pgtype-conversion and classification tests, and the integration suite drives the real adapter against a testcontainers Postgres.
ClusterClientFactoryHandleForMints an opaque ClusterHandle for a registered cluster. At this milestone the binary runs inside the single management cluster, so the production wiring uses one in-cluster controller-runtime client rather than a per-cluster fan-out.Fakes return a stub handle.
FleetHealthCheckerVerifyThe Verifier in reconcile/verify.go — the read-only verify gate. depguard confines controller-runtime to the reconcile/ subpackage.A fake controller-runtime client exercises both the healthy and unhealthy arms; the envtest and Chainsaw suites drive it against a real apiserver.
AuditSinkRecordA composition-root shim translating the local AuditEntry value object onto internal/audit.Entry — keeping the module free of an internal/audit import the no-cross-context-imports rule denies.An in-memory recording sink.
ClockNowA wall-clock implementation at the composition root.A frozen clock pins a deterministic now in unit tests.

The reconcile loop additionally declares a narrow EventSink port local to the reconcile/ package (reconcile/reconcile.go): the lifecycle events it emits are an integration concern, not an aggregate write, so per the interface-segregation principle the adapter declares the smallest dependency it needs rather than widening Repository with an outbox surface.

Lifecycle events

The context defines four typed domain events (events/events.go). The EventType discriminator string is stable and becomes part of the wire contract once an event is emitted; the set is closed and pinned by a package-local drift gate.

Event type (discriminator)TriggerPayload
managementfleet.ManagementClusterRegisteredA cluster is first materialised in the fleet inventory.Cluster identity, name, slug, region.
managementfleet.ProjectClusterAssignedThe region-assignment resolver places a Project onto the cluster that owns it (a fresh scheduling placement, or a re-point after a region migration).Project identity, owning cluster identity, region, namespace name.
managementfleet.ProjectNamespaceReadyThe per-Project namespace crosses into the Ready phase.Project identity, namespace name.
managementfleet.ProjectNamespaceTerminatedThe per-Project namespace crosses into the Deleted phase.Project identity, namespace name.

ProjectClusterAssigned was declared with the inventory model but reserved — without an emitter — until the region-assignment resolver (Region-assignment resolver) began firing it. The resolver is the event's emitter: it publishes ProjectClusterAssigned when it places a Project on the cluster matching its Domain region, and again when it re-points a migrated Project after the old assignment reaches Deleted. The low-level AssignProjectToCluster persist method deliberately does not emit the event inline — it is also called during internal region-migration bookkeeping, so emitting there would fire on writes that are not a fresh region-scheduling placement; the resolver owns the region-placement decision and is therefore the correct emitter.

The production composition root emits the reconcile events through a structured-slog sink — migration 0023 created only the two inventory tables, so there is no management-fleet outbox table to append to. Minting one is a schema change a dedicated story owns; until then the slog line is the operator-facing breadcrumb. The adapter rationale lives in the DECISION block on newManagementFleetEventSink in management_fleet_factory_prod.go.

Error sentinels

Every operation funnels through one of seven package-local sentinels. Callers branch on these via errors.Is — wrapping with fmt.Errorf("%w", …) is fine, identity must remain intact. The set is closed: adding an eighth without updating the closed-set drift gate trips the build (errors.go).

SentinelLayerTriggerRemediation
ErrManagementClusterNotFoundRepository / ServiceGetCluster or a fleet lookup for a ClusterID with no inventory row.Re-check the fleet inventory; register the cluster if it is genuinely absent.
ErrAssignmentNotFoundRepository / ServiceGetAssignment for a Project that has not been placed.Place the Project with AssignProjectToCluster, or accept the Project has no assignment.
ErrAssignmentImmutableService (resource-count gate inside the placement RunInTx)A re-assignment to a different cluster while the Project already owns ≥ 1 resource.Re-pointing would orphan live Composite Resources, ProviderConfigs, and Secrets. Tear the Project's resources down on the old cluster first, or keep the existing placement.
ErrClusterUnhealthyFleetHealthChecker verify gateA management cluster is missing a Crossplane / External Secrets Operator API group, or a substrate controller Deployment is not Available.Install or repair the substrate via the deploy/ Helm chart; no Project may be assigned until the gate passes.
ErrNoClusterForRegionService (SelectClusterForRegion)Placement is asked for a region with no matching registered cluster — or an empty region against a fleet that does not have exactly one cluster.Register a cluster for the region, or pass an explicit region when the fleet has more than one cluster.
ErrAssignmentTerminatingService (Unassign)Unassign is attempted while the namespace phase has not yet reached Deleted.Let the reconcile loop drain the namespace to Deleted, then retry Unassign.
ErrInvalidInputAggregate constructors / port boundariesA zero id, a zero or unordered timestamp, or an empty slug observed before any persistence call.Programmer error at a boundary; surfaces in tests, not in steady-state production.

Two further sentinels exist outside the closed domain set because they name wiring bugs, not observable domain failures, and a misconfigured composition root must fail fast at boot rather than on the first operation:

  • ErrServiceRepositoryRequiredNewService was handed a nil Repository (service.go).
  • ErrReconcilerClientRequired / ErrReconcilerRepositoryRequired / ErrReconcilerEventsRequiredNewReconciler was handed a nil collaborator (reconcile/reconcile.go).

The verify gate

The verify gate (reconcile/verify.go) is a read-only readiness check. This milestone ships the gate only; installing the substrate stays with the deploy/ Helm chart. The gate performs two checks and returns on the first failure:

  1. API groups present. Every provisioning API group is served by the cluster, resolved by mapping a representative GroupKind through the cluster RESTMapper. The gate probes one representative kind per group rather than the full CRD inventory — a partial CRD install is not a failure mode the upstream Helm charts produce:

    API groupRepresentative kindSubstrate
    apiextensions.crossplane.ioCompositeResourceDefinitionCrossplane composition
    pkg.crossplane.ioProviderConfigCrossplane packages
    external-secrets.ioExternalSecretExternal Secrets Operator
  2. Controllers Available. Every substrate controller Deployment exists and reports the Available status condition True. The gate pins the canonical upstream install coordinates: the crossplane Deployment in the crossplane-system Namespace (one controller serves both Crossplane groups), and the external-secrets Deployment in the external-secrets Namespace.

On any failure the gate returns an error wrapping ErrClusterUnhealthy so callers branch with errors.Is. A nil return means the cluster is eligible to host Projects. Inside the reconcile loop an ErrClusterUnhealthy is not fatal — it folds into VerifyPassed = false and the namespace keeps converging — whereas a non-unhealthy infrastructure error (an apiserver outage, a RESTMapper fault) propagates and fails the tick.

Persistence

Migration 0023_management_fleet.sql introduces two tables in the plexsphere schema:

  • plexsphere.management_clusters — one row per registered cluster. id is the application-minted UUID PRIMARY KEY; slug is held UNIQUE by management_clusters_slug_unique; region and kubeconfig_secret_ref are nullable (a cluster may be not-region-pinned, and the kubeconfig material lives behind a secret store, never in this table).
  • plexsphere.project_cluster_assignments — one row per Project. project_id is the PRIMARY KEY (a Project is assigned to at most one cluster) and references plexsphere.projects(id); management_cluster_id references plexsphere.management_clusters(id). Both FKs use ON DELETE RESTRICT. namespace_phase is held to the six-value closed set by project_cluster_assignments_phase_check. The project_cluster_assignments_cluster_idx index backs the ListAssignmentsForCluster range scan.

Because neither table holds secret bytes or hash-chained forensic rows, the migration's Down block performs a real DROP in reverse-FK order — it cannot resurrect compliance-sensitive plaintext on a subsequent Up.

Operational model

The management fleet is opt-in at the composition root. The single load-bearing knob is PLEXSPHERE_DSN: when it is empty the binary boots without a management-fleet reconcile probe (the early-boot posture for deployments that have not yet plumbed Postgres). The in-cluster Kubernetes access is ambient — the ServiceAccount the Pod runs under — and needs no env var.

Env varEffectDefault
PLEXSPHERE_DSNThe Postgres connection string. Empty disables management-fleet wiring entirely."" (inert).
PLEXSPHERE_MANAGEMENT_FLEET_RECONCILE_INTERVALThe steady-state period between fleet reconcile sweeps. Parsed with time.ParseDuration; a non-positive value is rejected at boot.30s.

The reconcile runs on two cadences, driven by management_fleet_factory_prod.go and registered through internal/platform/bootstrap/managementfleet_reconcile.go:

  1. Boot sweep — synchronous, before the listener binds. RegisterManagementFleetReconcileProbe runs the sweep once; a failure here refuses startup, because a fleet that cannot be reconciled into its expected shape must not serve traffic.
  2. /readyz probe + steady-state ticker — after the boot sweep the same closure is registered as a /readyz probe under the stable name management-fleet-reconcile, and a goroutine re-runs the sweep every reconcile interval. A failure on a later probe tick flips /readyz to HTTP 503 so Kubernetes (and operators) catch drift after the binary has already come up. The ticker exits cleanly on context cancellation.

The sweep lists every registered cluster, lists each cluster's assignments, and reconciles each one, returning the first error it hits. It is idempotent because Reconcile is total and idempotent.

Recovery runbook

This section is the operator-facing companion to the reference above. Each entry follows the same shape — Symptom, Diagnostic, Remediation — and is scoped to a single failure mode. The reconcile loop is idempotent, so unless an entry says otherwise the safe baseline action is to let the next sweep retry and watch /readyz and the management-fleet-reconcile probe recover.

1. Management cluster unreachable

Symptom. /readyz reports the management-fleet-reconcile probe failing (HTTP 503), or the boot sweep refused startup. The structured log carries management fleet reconcile tick failed with an error that wraps a controller-runtime client transport failure (connection refused, TLS handshake timeout, context deadline exceeded) rather than ErrClusterUnhealthy.

Diagnostic.

  • Confirm the apiserver of the management cluster the binary runs inside is reachable: kubectl get --raw=/healthz from the Pod, or inspect the in-cluster apiserver Service and endpoints.
  • Confirm the Pod's ServiceAccount token is mounted and not expired — a rotated or revoked token surfaces as a 401/403 from every Get.
  • Distinguish this from a substrate failure (entry 2): an unreachable cluster fails on the first resource Get; an unhealthy substrate reaches the verify gate and surfaces ErrClusterUnhealthy.

Remediation. Restore apiserver reachability or re-issue the ServiceAccount credentials. No fleet data is lost — the inventory and assignment rows are durable in Postgres. Once the apiserver is reachable the next sweep reconciles every assignment and /readyz returns to HTTP 200 with no operator action on the fleet records.

2. Crossplane or External Secrets Operator missing or unhealthy

Symptom. Per-Project namespaces never leave Provisioning (or sit in Degraded). The verify gate returns an error wrapping ErrClusterUnhealthy; the wrapped message names the specific substrate — a missing API group (apiextensions.crossplane.io, pkg.crossplane.io, external-secrets.io) or a controller Deployment that is not Available.

Diagnostic.

  • kubectl get crd | grep -E 'crossplane|external-secrets' confirms whether the substrate CRDs are installed.
  • kubectl -n crossplane-system get deploy crossplane and kubectl -n external-secrets get deploy external-secrets confirm the two controller Deployments exist and report Available.
  • Read the wrapped error: the gate stops on the first failure, so fix that substrate and re-check rather than assuming the rest are healthy.

Remediation. Install or repair the substrate via the deploy/ Helm chart — this context verifies the substrate, it does not install it. The verify failure is non-fatal to a converging namespace: the reconcile keeps converging the per-Project resources and the namespace crosses into Ready automatically once the gate passes. No assignment needs re-creating.

3. Namespace stuck in Terminating

Symptom. After Service.MarkTerminating, a per-Project namespace stays in the Terminating phase across many sweeps and never reaches Deleted. Service.Unassign fails with ErrAssignmentTerminating.

Diagnostic.

  • kubectl get ns plexsphere-project-<uuid> — a namespace stuck Terminating at the Kubernetes level almost always has a resource with a finalizer that has not been removed. kubectl get all,resourcequota,rolebinding -n plexsphere-project-<uuid> and inspect .metadata.finalizers on anything that remains.
  • The transition is sticky: once Terminating, the namespace can only drain to Deleted — it will not converge back. The phase advances to Deleted only when the reconcile observes all five per-Project resources absent.

Remediation. Clear the blocking finalizer on the offending object (typically a Crossplane managed resource awaiting external-API deletion). Once every per-Project resource is gone, the next sweep records Deleted and Service.Unassign succeeds. Do not delete the assignment row directly to "unstick" it — that orphans whatever is still live on the cluster and loses the teardown record.

4. Namespace deleted out of band

Symptom. An operator (or another controller) deleted a per-Project Namespace, Role, RoleBinding, ServiceAccount, or ResourceQuota directly on the cluster. The assignment row still shows Ready.

Diagnostic.

  • The reconcile derives its action exclusively from live cluster facts every tick, never from cached desired state. On the next sweep the ObservedCluster snapshot reports the missing resource.
  • A namespace that was Ready and lost a resource transitions to Degraded (not Provisioning), so Degraded in the log or the namespace_phase column is the fingerprint of an out-of-band deletion of a still-assigned Project.

Remediation. None required — this is the case the reconcile loop exists to handle. The next sweep emits Converge, recreates the missing resources idempotently, and the namespace returns to Ready, emitting a fresh ProjectNamespaceReady event. If the namespace does not self-heal, fall through to entry 2 (the verify gate may be failing) or entry 1 (the cluster may be unreachable).

5. Assignment-immutability conflict

Symptom. Service.AssignProjectToCluster fails with ErrAssignmentImmutable. The wrapped message names the Project, the resource count it owns, and the cluster it is currently pinned to.

Diagnostic.

  • The immutability gate is the Project's resource count, read inside the same transaction as the placement write. A re-assignment to a different cluster is rejected when the count is > 0; a re-assignment to the same cluster is idempotent and returns the existing assignment unchanged; a re-assignment to a different cluster with a count of 0 is permitted (it is a misplacement correction before any resource exists).
  • errors.Is(err, managementfleet.ErrAssignmentImmutable) confirms the failure class; Service.LookupAssignment shows the current pinning.

Remediation. This sentinel is a fail-closed guard, not a bug — re-pointing a Project with live resources would orphan its Composite Resources, ProviderConfigs, and materialised Secrets in the old namespace. To move a Project to a different cluster: drive its namespace through MarkTerminating → reconcile-to-DeletedUnassign on the old cluster (which tears the resources down), then AssignProjectToCluster to the new one. If the rejection was unexpected, the resource count is the source of truth — verify what the Project actually owns before forcing anything.

6. Rebuilding a cluster from the durable assignment records

Symptom. A management cluster was lost or rebuilt — every per-Project namespace and its RBAC and quota objects are gone — but the plexsphere.management_clusters and plexsphere.project_cluster_assignments rows in Postgres are intact.

Diagnostic.

  • The Postgres rows are the durable record of the fleet; the cluster-side objects are derived, reconcilable state. After a cluster rebuild the inventory still knows every cluster and every Project assignment, including each namespace name and phase.
  • Confirm the rebuilt cluster carries the substrate before relying on recovery: a rebuilt cluster with no Crossplane / External Secrets Operator install fails the verify gate (entry 2).

Remediation. Install the substrate on the rebuilt cluster, then let the reconcile sweep do the work. Each sweep lists every assignment for the cluster and reconciles it: every per-Project Namespace, Role, RoleBinding, ServiceAccount, and ResourceQuota is recreated idempotently from the derived namespace name, and each namespace crosses back into Ready. No assignment rows need re-creating and no Project needs re-placing — the inventory survived, so the fleet rebuilds itself from it. If a Project's old namespace had been mid-teardown, its row's namespace_phase still reflects that and the teardown arm resumes correctly.