Appearance
Cloud-init and DaemonSet bootstrap provisioning
This is the bootstrap-injection half of the Provisioning Broker: how a bootstrap token the reconcile loop mints reaches the substrate it just declared, and how the plexd agent on that substrate consumes the token to self-enrol into the mesh. broker.md owns the ProvisionedResource aggregate, the eight-phase status state machine, and the render-and-apply pipeline; this page is the render-side companion that documents the three injection strategies the pipeline threads a token through — the two new self-enrolment channels (a VM via cloud-init user-data, a Kubernetes cluster via a plexd DaemonSet) and the unchanged provider-secret leaf — and the apply-time invariants that keep the injected material alive across re-apply ticks.
The broker never connects inward to the substrate it provisions. It threads the token into the rendered Crossplane v2 Composite Resource, the blueprint Composition copies the field onto the provider's input, and the booting substrate pulls its own enrolment material and calls back out to the control plane's POST /v1/register. The control plane is the single inbound endpoint; the token is the only secret that crosses the boundary, and it crosses single-use and short-lived.
The renderers are pure functions in internal/provisioning/broker/render/: no cluster client, no logging, and the token plaintext never reaches a log line, an error string, or an event payload.
Cross-references
./broker.md— the bounded-context reference for theProvisionedResourceaggregate, the status state machine, the create-pipeline ports and error sentinels, and the operator recovery runbook. This page expands the one paragraphbroker.mdkeeps on the injection sites../blueprints.md— the Blueprint Catalog sub-context that owns the closedInjectionStrategyenum (cloud-init-user-data,helm-values,provider-secret) the broker consumes. The strategy is the blueprint's declaration of where the token belongs; the broker honours it../adoption.md— the operator how-to that brings a pre-existing Kubernetes cluster under management with the External-Secrets-Operator pattern. The DaemonSet renderer's ESO mode emits the same bundle shape that how-to walks an operator through by hand (see The DaemonSet bundle).../identity/bootstrap-tokens.mdand../identity/registration.md— the identity context that mints the token the broker injects and serves thePOST /v1/registerendpoint the booting substrate calls back to. The token's single-use semantics andconsumed_atflip are owned there.../../how-to/enrolment/register-a-node.md— the underlying registration contract every self-enrolment channel terminates at.../../../internal/provisioning/broker/render/cloudinit.go,../../../internal/provisioning/broker/render/daemonset.go, and../../../internal/provisioning/broker/render/injection.go— the three pure renderers:CloudInitUserData,DaemonSetBundle, andInjectBootstrap.../../../internal/provisioning/broker/reconcile/apply.go— the reconcile apply step that callsInjectBootstrapon a mint tick andliveBootstrapFieldson a re-apply tick.../../../internal/provisioning/broker/reconcile/config.go— the reconcileConfigthat carries the three knobs and theNewReconcilervalidation that rejects a mis-wired URL or image at boot.../../../cmd/plexsphere/provisioning_broker_factory_prod.go— the production composition root that reads the three environment variables and threads them onto the reconcileConfig.../../../internal/provisioning/blueprints/catalog/vm-generic-cloudinit/xrd.yaml— a reshaped seed XRD that exposesspec.userDataand an openspec.parametersobject for the Composition to patch onto the provider.
The two self-enrolment channels
A blueprint's InjectionStrategy names how the booting substrate receives its token. Two of the three strategies drive a self-enrolment channel where plexsphere ships no agent and opens no inbound connection; the third is the unchanged opaque-leaf strategy a provider-specific Composition consumes on its own.
| Channel | Strategy | Substrate | How the token arrives |
|---|---|---|---|
| Cloud-init VM | cloud-init-user-data | A cloud VM (Hetzner, OpenStack, AWS EC2). | The broker renders a full #cloud-config document into spec.userData; the Composition patches it onto the provider's user-data input; the cloud serves it through its instance metadata service; cloud-init runs it on first boot. |
| Kubernetes DaemonSet | helm-values (broker-rendered, plexd Helm chart) / applied bundle | A Kubernetes cluster — either one the broker provisions or one an operator already runs. | The broker threads the token plus the control-plane URL and plexd image into spec.parameters.helmValues, which the Composition feeds a plexd Helm release; or an operator applies the multi-document DaemonSet bundle DaemonSetBundle renders directly. |
| Provider secret | provider-secret | Any substrate whose Composition reads an opaque token leaf. | The broker threads the bare token into spec.parameters.providerSecret.bootstrapToken; the Composition consumes it however the provider needs. Unchanged from the earlier broker milestone. |
The rest of this page documents the cloud-init document contract, the DaemonSet bundle, the per-strategy field map the injection renderer writes, and the apply-time rules that keep all of it alive.
The user-data document contract
CloudInitUserData (render/cloudinit.go) renders the first-boot enrolment document a Provisioned VM consumes. It takes three required inputs — the control-plane base URL, the plexd binary download URL, and the minted token plaintext — and returns the full payload: the literal #cloud-config header line followed by a YAML body.
The rendered document does exactly three things on first boot:
- Drops the token and the environment file. A
write_filesentry writes the token plaintext to/etc/plexd/bootstrap-tokenat mode0600, and a second entry writesPLEXD_API_URL=<api-base-url>to/etc/plexd/plexd.envat mode0644. The token file is owner-read only; the env file carries no secret — only the public control-plane URL — so it is world-readable. - Installs plexd. A
runcmdstep fetches the binary from the plexd download URL withcurl -fsSL … -o /usr/local/bin/plexdandchmod +xit. - Registers. The final
runcmdstep runsplexd register --bootstrap-token-file=/etc/plexd/bootstrap-token --api-url=<api-base-url>.
Two contract details are load-bearing:
- The register step reads the token from the dropped file, never from the command line. The token already lives at
/etc/plexd/bootstrap-tokenat mode0600; passing--bootstrap-token=<plaintext>on the argv would copy the bearer into the process argument list and the cloud-init execution log on the VM for no gain. The--bootstrap-token-fileform keeps the plaintext to exactly one on-disk location. - The
#cloud-configheader is prepended verbatim, not marshalled. cloud-init requires the literal first line#cloud-configto treat the payload as a cloud-config document. A YAML comment would not survive a marshal round-trip, so the header is never part of the typed struct — it is appended ahead of the marshalled body as raw bytes.
CloudInitUserData fails closed wrapping broker.ErrInvalidInput when any input is empty or whitespace, or when either URL is not an absolute http(s) URL with a host. Every validation error names the offending field — never the value — so a misrender is diagnosable without the token plaintext ever reaching the error surface.
The metadata-service delivery model
The cloud-init document is not pushed to the VM by plexsphere. It rides the cloud's instance metadata service:
- The broker threads the rendered document into the Composite Resource's
spec.userData. - The blueprint Composition patches
spec.userDataonto the provider's user-data input (FromCompositeFieldPath: spec.userData → spec.forProvider.userData). - The Crossplane provider passes that value to the cloud's instance API as the instance user-data.
- The cloud stores it and serves it through its per-instance metadata endpoint (for example the link-local
169.254.169.254service). - cloud-init on the booting VM reads its user-data from that endpoint on first boot and executes the
write_files+runcmdprogram.
This is why the token's secrecy posture matters more than transport encryption would suggest. The user-data document — token and all — is readable by anything on the instance that can reach the metadata service, and it persists in the cloud's user-data store for the instance's lifetime. plexsphere does not control that store, so the token is the one secret it deliberately makes single-use and short-lived: the redemption window is bounded by the token TTL, and the first successful POST /v1/register consumes it, so a token recovered from instance metadata after first boot is already inert. See Token secrecy and TTL posture.
The DaemonSet bundle
DaemonSetBundle (render/daemonset.go) renders the multi-document YAML bundle a Kubernetes target applies to run a self-enrolling plexd DaemonSet. The bundle carries three documents:
- A
plexsphere-systemNamespace. - A token-delivery object — an embedded
Secretor an External-Secrets-OperatorExternalSecret(see below). - The plexd DaemonSet itself.
The DaemonSet mounts the delivered Secret named plexd-bootstrap-token at /etc/plexd with defaultMode 0440, projecting the token key onto the file bootstrap-token — so the on-disk path is /etc/plexd/bootstrap-token, the exact path the register --bootstrap-token-file=… argv reads, identical to the cloud-init channel. Every pod runs hardened: non-root UID 65532, fsGroup 65532, readOnlyRootFilesystem, all Linux capabilities dropped, no privilege escalation, the RuntimeDefault seccomp profile, and CPU/memory requests and limits. The token volume is read-only on a read-only root filesystem. The fsGroup re-groups the projected Secret to the non-root runtime so the 0440 (owner + group read) token file is readable through its group bit; an owner-only 0400 would leave the Secret owned root:root and deny the 65532 UID — the agent would crash-loop on open /etc/plexd/bootstrap-token: permission denied.
DaemonSetBundle fails closed wrapping broker.ErrInvalidInput when the api URL is not absolute http(s), when the plexd image is untagged or floats on :latest (a node agent must pin a reproducible image), or when the token-delivery fields are inconsistent with the selected mode.
Two token-delivery modes
The TokenDelivery field is a closed enum; its zero value is invalid so a caller that forgets to set it fails closed.
| Mode | Token-delivery document | Carries plaintext? | When |
|---|---|---|---|
secret (TokenDeliverySecret) | A Kubernetes Secret whose stringData.token is the minted plaintext, shipped in the bundle. | Yes. | plexsphere provisions the cluster, mints the token, and ships it embedded. Requires a non-empty plaintext; an empty one is rejected. |
eso (TokenDeliveryESO) | An ExternalSecret (external-secrets.io/v1) that materialises the plexd-bootstrap-token Secret from a ClusterSecretStore at refreshInterval: 0s. | No. | An operator adopts a cluster they already run and pushes the token into their own secret store. The bundle must carry no plaintext; a non-empty one, or an empty store name or remote key, is rejected. |
The ExternalSecret is rendered as an unstructured map literal rather than a typed external-secrets Go struct: render/ already builds Crossplane objects unstructured to avoid coupling the broker to a provider module's release cadence, and the ESO resource is a small, stable shape. Its refreshInterval: 0s fetches the token once on create and never re-reconciles it — the bootstrap token is single-use, so a rotating refresh would be wasted work that risks re-materialising a value the upstream store has since rotated away.
The ESO mode and the adoption pattern
The eso mode is the programmatic emitter of the bundle shape adoption.md documents for a human operator. Both converge on one contract:
- the
Secretis namedplexd-bootstrap-tokenin theplexsphere-systemnamespace; - the token lives under the
tokendata key; - the
ExternalSecretreads it from aClusterSecretStorewithrefreshInterval: 0s; - the DaemonSet mounts the
Secretat/etc/plexd/bootstrap-token.
A broker-rendered ESO bundle and a hand-authored adoption ExternalSecret are therefore interchangeable: the broker renders what the adoption how-to walks an operator through by hand, and plexsphere never reads from the customer's secret store in either case.
The injected-XR field map
InjectBootstrap (render/injection.go) threads a BootstrapMaterial into the rendered Composite Resource at the single site the blueprint's InjectionStrategy names. A given XR ever carries exactly one strategy's fields.
| Strategy | XR field(s) written | Shape | Material fields consumed |
|---|---|---|---|
cloud-init-user-data | spec.userData | A full #cloud-config document (rendered by CloudInitUserData). | TokenPlaintext, APIBaseURL, PlexdDownloadURL |
helm-values | spec.parameters.helmValues.bootstrapToken, ….apiUrl, ….plexdImage | Three scalar leaves. | TokenPlaintext, APIBaseURL, PlexdImage |
provider-secret | spec.parameters.providerSecret.bootstrapToken | One scalar leaf. | TokenPlaintext |
Two design points distinguish the cloud-init strategy from the earlier broker behaviour:
- Cloud-init writes a whole document, not a bare token. The earlier broker dropped the bare token at
spec.parameters.cloudInit.bootstrapTokenand left the Composition to assemble the user-data. The reshaped Cloud-Init VM XRD now exposesspec.userDataas the single field the Composition patches onto the provider, and a first-boot VM needs the whole enrolment program (token file, plexd install, register invocation) — not just the token. Rendering the complete document in the broker keeps the Composition a verbatimFromCompositeFieldPathcopy and makes the broker the single owner of the enrolment contract. - Injection is fail-closed. A nil XR, a strategy outside the closed taxonomy, or material missing a field the selected strategy consumes yields an error wrapping
broker.ErrInvalidInputrather than a half-threaded Composite Resource. As with every renderer, the token plaintext is threaded only into the XR object and never into the returned error, a log line, or any other surface.
The three knobs the cloud-init and helm-values strategies consume — APIBaseURL, PlexdDownloadURL, PlexdImage — arrive from the composition root; see The three composition-root knobs.
The force-apply preservation rule
The broker reconcile loop re-applies the Composite Resource every tick under a stable Server-Side Apply field manager (plexsphere-provisioning-broker) with force-ownership. This is the normal steady state — an apply against an already-present object is an idempotent no-op write — but it interacts with bootstrap injection in a way that needs a deliberate guard.
A bootstrap token is minted exactly once per ProvisionedResource, gated by the aggregate's HasBootstrapToken. That splits the apply path into two kinds of tick:
- Mint tick — the token was issued this tick, so the plaintext is in hand.
renderObjectscallsInjectBootstrap, and the rendered XR carries the strategy's bootstrap field(s). - Re-apply tick —
HasBootstrapTokenis already true, so the issuer is not called and there is no plaintext. The freshly-rendered XR omits the bootstrap field(s) entirely.
A force Server-Side Apply prunes any field the owning manager no longer declares. On a re-apply tick the rendered XR omits the bootstrap fields, so a naïve force-apply would delete the previously-injected spec.userData (or helm/provider leaf) the booting node depends on.
liveBootstrapFields (reconcile/apply.go) closes that gap. On a re-apply tick it reads the broker-owned bootstrap fields back from the live XR and re-threads them onto the freshly-rendered object before the force-apply — exactly the idiom parameterValues uses to read spec.parameters back. The set of owned paths is closed:
text
spec.userData
spec.parameters.helmValues.bootstrapToken
spec.parameters.helmValues.apiUrl
spec.parameters.helmValues.plexdImage
spec.parameters.providerSecret.bootstrapTokenThe read is strategy-agnostic: it copies whichever of the five paths exist on the live XR. A given XR only ever carries one strategy's fields, so reading all five copies exactly the populated one and leaves the rest absent — no empty-string writes, no re-deriving the strategy. On the first apply the XR does not exist yet (apierrors.IsNotFound), so liveBootstrapFields is a no-op: the mint path injects the fields on the tick the token is issued, and this guard only protects the re-apply ticks that follow. Re-supplying the same live values is an idempotent no-op write.
The same read-back-the-live-XR posture is why required blueprint parameters carry the limitation documented in broker.md — the broker persists neither the parameter values nor the rendered bootstrap document on the aggregate, treating the live XR as their durable home once applied.
The three composition-root knobs
The cloud-init and helm-values strategies need three values the broker domain cannot invent: where the booting node registers, where it fetches plexd, and which plexd image a Helm release deploys. They are required configuration, read once at boot.
| Environment variable | Reconcile Config field | Threaded into | Validation |
|---|---|---|---|
PLEXSPHERE_PROVISIONING_BROKER_ENROL_BASE_URL | EnrolBaseURL | BootstrapMaterial.APIBaseURL — the cloud-init plexd.env + register --api-url, the helm-values apiUrl leaf, and the DaemonSet api URL. | Absolute http(s) URL with a host. |
PLEXSPHERE_PROVISIONING_BROKER_PLEXD_DOWNLOAD_URL | PlexdDownloadURL | BootstrapMaterial.PlexdDownloadURL — the cloud-init runcmd curl that fetches the plexd binary. | Absolute http(s) URL with a host. |
PLEXSPHERE_PROVISIONING_BROKER_PLEXD_IMAGE | PlexdImage | BootstrapMaterial.PlexdImage — the helm-values plexdImage leaf (and the DaemonSet image the e2e suite pins). | Tagged or digest-pinned; the floating :latest tag and a tag-less reference are rejected. |
All three are required when PLEXSPHERE_DSN is set — that is, whenever the provisioning broker is wired at all. The composition root reads each with strings.TrimSpace and returns a dedicated missing-knob sentinel when the broker is active but a value is blank. The NewReconciler constructor re-validates them (reconcile/config.go): ErrReconcilerEnrolBaseURLRequired, ErrReconcilerPlexdDownloadURLRequired, and ErrReconcilerPlexdImageRequired fail a mis-wired boot before the first reconcile tick, so a tag-less image or a malformed URL never surfaces deep inside an apply against live substrate. The dev overlay pins the plexd image to the repository's tracked plexd version so the local stack and the published binary never drift.
Token secrecy and TTL posture
The bootstrap token is the only secret that crosses into the substrate, and every layer of this feature treats it as one-time, short-lived bearer material:
- Never logged, never in an error, never persisted in cleartext. Every renderer performs no logging and names only the offending field or strategy in a validation error — never the token value. The broker persists only the token id on the
ProvisionedResourceaggregate; the plaintext lives in-process for exactly one apply and is handed toInjectBootstrapand nowhere else. - Minted at most once. The
HasBootstrapTokengate ensures the issuer is called once perProvisionedResource. The token id is persisted before the apply, so a crash between the issue and the apply still leavesHasBootstrapTokentrue on the next tick — the issuer is never called twice for one resource. - Single-use. The first successful
POST /v1/registerconsumes the token and flips itsconsumed_at. A second consumer — a DaemonSet replica that loses the race, a re-run of a VM's first boot — receives403 token_consumed. The single-use semantics are owned by the identity context (see../identity/bootstrap-tokens.md). - Short-lived. Every minted token carries the broker's configured TTL (
PLEXSPHERE_PROVISIONING_BROKER_TOKEN_TTL, validated at boot against the issuer's[MinTTL, MaxTTL]window; see the operational model inbroker.md). A token un-redeemed past its TTL cannot be consumed, and the broker does not re-issue — the at-most-once gate keeps the original id on the aggregate. - Least-privilege on disk. The cloud-init channel drops the token at mode
0600; the DaemonSet channel projects it at mode0440(owner + group read, paired with the pod'sfsGroup 65532so the non-root runtime can read it), read-only, on a non-root, read-only-root-filesystem pod.
Self-enrolment troubleshooting runbook
This section is the operator-facing companion to the contract above. Each entry follows the same shape — Symptom, Diagnostic, Remediation — and is scoped to a single self-enrolment failure mode. These entries pick up where the
broker.mdrecovery runbook's entry 5 ("Bootstrap token issued but the Node never registers") leaves off: that entry diagnoses the broker side; these diagnose the substrate side.
1. Token expired before first boot completed
Symptom. A ProvisionedResource sits in Enrolling across many sweeps and never crosses into Ready, even though Crossplane reports the substrate up. The VM booted, or the DaemonSet rolled out, but no Node registered.
Diagnostic.
- The token carries the broker's configured TTL, stamped at mint time. If the substrate took longer to boot, install plexd, and reach the control plane than the TTL allowed, the token expired un-redeemed and
POST /v1/registernow rejects it as expired. - On a VM, read the cloud-init execution log (for example
/var/log/cloud-init-output.log): aregisterstep that ran after the TTL elapsed reports a rejected token. On a DaemonSet, readkubectl -n plexsphere-system logs daemonset/plexd. - Confirm the gap: compare the token's issue time against its TTL and the substrate's first-boot timestamp. Slow image pulls, a cold cloud region, or a slow plexd download URL are the usual causes.
Remediation. The broker does not re-issue a token for an existing ProvisionedResource — the at-most-once gate keeps the original id. The substrate must be re-provisioned with a fresh token: declare a new ProvisionedResource. To widen the redemption window for substrate that is slow to come up, raise PLEXSPHERE_PROVISIONING_BROKER_TOKEN_TTL before re-declaring, and confirm the plexd download URL is reachable and fast from the substrate's network.
2. DaemonSet replica race — 403 token_consumed
Symptom. A plexd DaemonSet rolled out across a multi-node cluster. One node registered; the rest crash-loop. Their logs show 403 token_consumed on POST /v1/register.
Diagnostic.
- The bootstrap token is single-use. A DaemonSet schedules one plexd pod per node, and every pod mounts the same
plexd-bootstrap-tokenSecret. The first pod to callregisterwins and consumes the token; every other pod presents the now-consumed token and is correctly refused with403 token_consumed. - This is expected behaviour for a single-token DaemonSet across more than one node, not a fault: a single-use credential cannot enrol N nodes.
Remediation. Decide the intended enrolment model. For a Kubernetes cluster that should register as one Adopted Resource, a single registration is correct — let the winning pod enrol and remove the DaemonSet (or scope it to one node) so the others stop crash-looping. For a fleet where each node should enrol independently, issue a distinct bootstrap token per node and deliver one per node rather than a single shared Secret. See ./adoption.md for the adoption enrolment model.
3. ExternalSecret never materialises — pod stuck pending-secret
Symptom. An eso-mode DaemonSet bundle was applied, but the plexd pods never start. They sit Pending (or ContainerCreating) and the plexd-bootstrap-token Secret does not exist in plexsphere-system.
Diagnostic.
- The DaemonSet mounts the
plexd-bootstrap-tokenSecretas a volume, so a pod cannot start until thatSecretexists. Inesomode theSecretis materialised by the External Secrets Operator from theExternalSecret, not shipped in the bundle — so anExternalSecretthat never reconciles leaves the pod waiting on a missing volume. - Read the
ExternalSecretstatus:kubectl -n plexsphere-system get externalsecret plexd-bootstrap-token -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'. ASecretSyncedErrorpoints at the store; a missing status points at ESO not running. - The usual causes: the External Secrets Operator is not installed in the cluster; the referenced
ClusterSecretStoredoes not exist or is notReady; the operator's credentials cannot read the remote key; or the operator never pushed the token into the store under that key.
Remediation. Restore the ESO path: install or repair the External Secrets Operator, confirm the ClusterSecretStore named in the bundle exists and is Ready, and confirm the token plaintext was pushed into the store under the remote key the ExternalSecret reads. Once the Secret materialises, the DaemonSet pods leave Pending and the winning pod registers. Because refreshInterval is 0s, fix the store before the token TTL elapses — ESO fetches the value once on create and does not retry on a rotated schedule. If the token expired while the ExternalSecret was stuck, fall through to entry 1 and re-provision with a fresh token.